You are on page 1of 775

Lecture Notes
in Computational Science
and Engineering 61
Editors
Timothy J. Barth
Michael Griebel
David E. Keyes
Risto M. Nieminen
Dirk Roose
Tamar Schlick

Tarek P. A. Mathew

Domain Decomposition
Methods for the Numerical
Solution of Partial
Differential Equations

With 40 Figures and 1 Table

ABC

Tarek Poonithara Abraham Mathew
tmathew@poonithara.org

ISBN 978-3-540-77205-7 e-ISBN 978-3-540-77209-5

Lecture Notes in Computational Science and Engineering ISSN 1439-7358
Library of Congress Control Number: 2008921994

Mathematics Subject Classification (2000): 65F10, 65F15, 65N22, 65N30, 65N55,
65M15, 65M55, 65K10 

c 2008 Springer-Verlag Berlin Heidelberg
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations are
liable for prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Cover design: WMX Design GmbH, Heidelberg
Printed on acid-free paper
9 8 7 6 5 4 3 2 1
spinger.com

In loving dedication to my (late) dear mother,
and to my dear father and brother

Preface

These notes serve as an introduction to a subject of study in computational
mathematics referred to as domain decomposition methods. It concerns divide
and conquer methods for the numerical solution and approximation of partial
differential equations, primarily of elliptic or parabolic type. The methods in
this family include iterative algorithms for the solution of partial differential
equations, techniques for the discretization of partial differential equations
on non-matching grids, and techniques for the heterogeneous approximation
of partial differential equations of heterogeneous character. The divide and
conquer methodology used is based on a decomposition of the domain of the
partial differential equation into smaller subdomains, and by design is suited
for implementation on parallel computer architectures. However, even on serial
computers, these methods can provide flexibility in the treatment of complex
geometry and heterogeneities in a partial differential equation.
Interest in this family of computational methods for partial differential
equations was spawned following the development of various high perfor-
mance multiprocessor computer architectures in the early eighties. On such
parallel computer architectures, the execution time of these algorithms, as
well as the memory requirements per processor, scale reasonably well with
the size of the problem and the number of processors. From a computational
viewpoint, the divide and conquer methodology based on a decomposition
of the domain of the partial differential equation, yields algorithms having
coarse granularity, i.e., a significant portion of the computations can be im-
plemented concurrently on different processors, while the remaining portion
requires communication between the processors. As a consequence, these al-
gorithms are well suited for implementation on MIMD (multiple instruction,
multiple data) architectures. Currently, such parallel computer architectures
can alternatively be simulated using a cluster of workstations networked with
high speed connections using communication protocols such as MPI (Message
Passing Interface) [GR15] or PVM (Parallel Virtual Machines) [GE2].

VIII Preface

The mathematical roots of this subject trace back to the seminal work of
H. A. Schwarz [SC5] in the nineteenth century. Schwarz proposed an iterative
method, now referred to as the Schwarz alternating method, for constructing
harmonic functions on regions of irregular shape which can be expressed as
the union of subregions of regular shape (such as rectangles and spheres). His
motivation was primarily theoretical, to establish the existence of harmonic
functions on irregular regions, and his method was not used in computations
until recently [SO, MO2, BA2, MI, MA37, DR11, LI6, LI7, BR18].
A general development of domain decomposition methodology for par-
tial differential equations occurred only subsequent to the development of
parallel computer architectures, though divide and conquer methods such as
Kron’s method for electrical circuits [KR] and the substructuring method
[PR4] in structural engineering, pre-date domain decomposition methodol-
ogy. Usage of the term “domain decomposition” seems to have originated
around the mid-eighties [GL2] when interest in these methods gained mo-
mentum. The first international symposium on this subject was held in Paris
in 1987, and since then there have been yearly international conferences on
this subject, attracting interdisciplinary interest from communities of engi-
neers, applied scientists and computational mathematicians from around the
globe.
Early literature on domain decomposition methods focused primarily on
iterative procedures for the solution of partial differential equations. As the
methodology evolved, however, techniques were also developed for coupling
discretizations on subregions with non-matching grids, and for constructing
heterogeneous approximations of complicated systems of partial differential
equations having heterogeneous character. The latter approximations are built
by solving local equations of different character. From a mathematical view-
point, these diverse categories of numerical methods for partial differential
equations may be derived within several frameworks. Each decomposition of
a domain typically suggests a reformulation of the original partial differen-
tial equation as an equivalent coupled system of partial differential equations
posed on the subdomains with boundary conditions chosen to match solu-
tions on adjacent subdomains. Such equivalent systems are referred to in
these notes as hybrid formulations, and provide a framework for develop-
ing novel domain decomposition methods. Divide and conquer algorithms can
be obtained by numerical approximation of hybrid formulations. Four hybrid
formulations are considered in these notes, suited for equations primarily of
elliptic type:
• The Schwarz formulation.
• The Steklov-Poincar´e (substructuring or Schur complement) formulation.
• The Lagrange multiplier formulation.
• The Least squares-control formulation.
Alternative hybrid formulations are also possible, see [CA7, AC5].

Preface IX

The applicability and stability of each hybrid formulation depends on
the underlying partial differential equation and subdomain decomposition.
For instance, the Schwarz formulation requires an overlapping decomposi-
tion, while the Steklov-Poincar´e and Lagrange multiplier formulations are
based on a non-overlapping decomposition. The least squares-control method
can be formulated given overlapping or non-overlapping decompositions.
Within each framework, novel iterative methods, discretizations schemes on
non-matching grids, and heterogeneous approximations of the original par-
tial differential equation, can be developed based on the associated hybrid
formulations.
In writing these notes, the author has attempted to provide an accessible
introduction to the important methodologies in this subject, emphasizing a
matrix formulation of algorithms. However, as the literature on domain de-
composition methods is vast, various topics have either been omitted or only
touched upon. The methods described here apply primarily to equations of
elliptic or parabolic type, and applications to hyperbolic equations [QU2], and
spectral or p-version elements have been omitted [BA4, PA16, SE2, TO10].
Applications to the equations of elasticity and to Maxwell’s equations have
also been omitted, see [TO10]. Parallel implementation is covered in greater
depth in [GR12, GR10, FA18, FA9, GR16, GR17, HO4, SM5, BR39]. For
additional domain decomposition theory, see [XU3, DR10, XU10, TO10]. A
broader discussion on heterogeneous domain decomposition can be found in
[QU6], and on FETI-DP and BDDC methods in [TO10, MA18, MA19]. For
additional bibliography on domain decomposition, see http://www.ddm.org.
Readers are assumed to be familiar with the basic properties of ellip-
tic and parabolic partial differential equations [JO, SM7, EV] and tradi-
tional methods for their discretization [RI, ST14, CI2, SO2, JO2, BR28, BR].
Familiarity is also assumed with basic numerical analysis [IS, ST10], com-
putational linear algebra [GO4, SA2, AX, GR2, ME8], and elements of op-
timization theory [CI4, DE7, LU3, GI2]. Selected background topics are re-
viewed in various sections of these notes. Chap. 1 provides an overview of
domain decomposition methodology in a context involving two subdomain
decompositions. Four different hybrid formulations are illustrated for a model
coercive 2nd order elliptic equation. Chapters 2, 3 and 4 describe the ma-
trix implementation of multisubdomain domain decomposition iterative al-
gorithms for traditional discretizations of self adjoint and coercive elliptic
problems. These chapters should ideally be read prior to the other chapters.
Readers unfamiliar with constrained minimization problems and their saddle
point formulation, may find it useful to review background in Chap. 10 or
in [CI4], as saddle point methodology is employed in Chaps. 1.4 and 1.5 and
in Chaps. 4 and 6. With a few exceptions, the remaining chapters may be
read independently.

X Preface

The author expresses his deep gratitude to the anonymous referees who
made numerous suggestions for revision and improvement of the manuscript.
Deep gratitude is also expressed to Prof. Olof Widlund who introduced the
author to this subject over twenty years ago, to Prof. Tony Chan, for his kind
encouragement to embark on writing a book extending our survey paper on
this subject [CH11], and to Prof. Xiao-Chuan Cai, Prof. Marcus Sarkis and
Prof. Junping Wang for their research collaborations and numerous insightful
discussions over the years. The author deeply thanks Prof. Timothy Barth
for his kind permission to use the figure on the cover of this book, and for
use of Fig. 5.1. To former colleagues at the University of Wyoming, and to
professors Myron Allen, Gastao Braga, Benito Chen, Duilio Concei¸c˜ao, Max
Dryja, Frederico Furtado, Juan Galvis, Etereldes Gon¸calves, Raytcho Lazarov,
Mary Elizabeth Ong, Peter Polyakov, Giovanni Russo, Christian Schaerer,
Shagi-Di Shih, Daniel Szyld, Panayot Vassilevski and Henrique Versieux, the
author expresses his deep gratitude. Deep appreciation is also expressed to the
editors of the LNCSE series, Dr. Martin Peters, Ms. Thanh-Ha LeThi, and
Mr. Frank Holzwarth for their patience and kind help during the completion
of this manuscript. Finally, deep appreciation is expressed to Mr. Elumalai
Balamurugan for his kind assistance with reformatting the text. The author
welcomes comments and suggestions from readers, and hopes to post updates
at www.poonithara.org/publications/dd.

January 2008 Tarek P. A. Mathew

Contents

1 Decomposition Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Hybrid Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Schwarz Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Steklov-Poincar´e Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Lagrange Multiplier Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5 Least Squares-Control Framework . . . . . . . . . . . . . . . . . . . . . . . . . 36

2 Schwarz Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2 Projection Formulation of Schwarz Algorithms . . . . . . . . . . . . . . 56
2.3 Matrix Form of Schwarz Subspace Algorithms . . . . . . . . . . . . . . 66
2.4 Implementational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.5 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3 Schur Complement and Iterative
Substructuring Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.2 Schur Complement System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.3 FFT Based Direct Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.4 Two Subdomain Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . . . 140
3.5 Preconditioners in Two Dimensions . . . . . . . . . . . . . . . . . . . . . . . . 155
3.6 Preconditioners in Three Dimensions . . . . . . . . . . . . . . . . . . . . . . . 162
3.7 Neumann-Neumann and Balancing Preconditioners . . . . . . . . . . 175
3.8 Implementational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
3.9 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

4 Lagrange Multiplier Based Substructuring:
FETI Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
4.1 Constrained Minimization Formulation . . . . . . . . . . . . . . . . . . . . . 232
4.2 Lagrange Multiplier Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.3 Projected Gradient Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
4.4 FETI-DP and BDDC Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

XII Contents

5 Computational Issues and Parallelization . . . . . . . . . . . . . . . . . . 263
5.1 Algorithms for Automated Partitioning of Domains . . . . . . . . . . 264
5.2 Parallelizability of Domain Decomposition Solvers . . . . . . . . . . . 280

6 Least Squares-Control Theory: Iterative Algorithms . . . . . . 295
6.1 Two Overlapping Subdomains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
6.2 Two Non-Overlapping Subdomains . . . . . . . . . . . . . . . . . . . . . . . . 303
6.3 Extensions to Multiple Subdomains . . . . . . . . . . . . . . . . . . . . . . . . 310

7 Multilevel and Local Grid Refinement Methods . . . . . . . . . . . 313
7.1 Multilevel Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
7.2 Iterative Algorithms for Locally Refined Grids . . . . . . . . . . . . . . 321

8 Non-Self Adjoint Elliptic Equations: Iterative Methods . . . . 333
8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
8.2 Diffusion Dominated Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
8.3 Advection Dominated Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
8.4 Time Stepping Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
8.5 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

9 Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
9.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
9.2 Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
9.3 Non-Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
9.4 Parareal-Multiple Shooting Method . . . . . . . . . . . . . . . . . . . . . . . . 401
9.5 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408

10 Saddle Point Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
10.1 Properties of Saddle Point Systems . . . . . . . . . . . . . . . . . . . . . . . . 418
10.2 Algorithms Based on Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
10.3 Penalty and Regularization Methods . . . . . . . . . . . . . . . . . . . . . . . 434
10.4 Projection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
10.5 Krylov Space and Block Matrix Methods . . . . . . . . . . . . . . . . . . . 445
10.6 Applications to the Stokes and Navier-Stokes Equations . . . . . . 456
10.7 Applications to Mixed Formulations of Elliptic Equations . . . . . 474
10.8 Applications to Optimal Control Problems . . . . . . . . . . . . . . . . . . 489

11 Non-Matching Grid Discretizations . . . . . . . . . . . . . . . . . . . . . . . . 515
11.1 Multi-Subdomain Hybrid Formulations . . . . . . . . . . . . . . . . . . . . . 516
11.2 Mortar Element Discretization: Saddle Point Approach . . . . . . . 523
11.3 Mortar Element Discretization: Nonconforming Approach . . . . . 551
11.4 Schwarz Discretizations on Overlapping Grids . . . . . . . . . . . . . . . 555
11.5 Alternative Nonmatching Grid Discretization Methods . . . . . . . 559
11.6 Applications to Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . 564

Contents XIII

12 Heterogeneous Domain Decomposition Methods . . . . . . . . . . . 575
12.1 Steklov-Poincar´e Heterogeneous Model . . . . . . . . . . . . . . . . . . . . . 576
12.2 Schwarz Heterogeneous Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
12.3 Least Squares-Control Heterogeneous Models . . . . . . . . . . . . . . . 589
12.4 χ-Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
12.5 Applications to Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . 603
13 Fictitious Domain and Domain Imbedding Methods . . . . . . . 607
13.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
13.2 Preconditioners for Neumann Problems . . . . . . . . . . . . . . . . . . . . 610
13.3 Preconditioners for Dirichlet Problems . . . . . . . . . . . . . . . . . . . . . 611
13.4 Lagrange Multiplier and Least Squares-Control Solvers . . . . . . . 614
14 Variational Inequalities and Obstacle Problems . . . . . . . . . . . . 621
14.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
14.2 Projected Gradient and Relaxation Algorithms . . . . . . . . . . . . . . 628
14.3 Schwarz Algorithms for Variational Inequalities . . . . . . . . . . . . . 633
14.4 Monotone Convergence of Schwarz Algorithms . . . . . . . . . . . . . . 636
14.5 Applications to Parabolic Variational Inequalities . . . . . . . . . . . . 644
15 Maximum Norm Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
15.1 Maximum Principles and Comparison Theorems . . . . . . . . . . . . . 648
15.2 Well Posedness of the Schwarz Hybrid Formulation . . . . . . . . . . 659
15.3 Convergence of Schwarz Iterative Algorithms . . . . . . . . . . . . . . . . 661
15.4 Analysis of Schwarz Nonmatching Grid Discretizations . . . . . . . 668
15.5 Analysis of Schwarz Heterogeneous Approximations . . . . . . . . . . 674
15.6 Applications to Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . 677
16 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
16.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
16.2 Gradient and Preconditioned Gradient Methods . . . . . . . . . . . . . 682
16.3 Schur Complement Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
16.4 Schwarz Subspace Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
16.5 Modal Synthesis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
17 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689
17.1 Traditional Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
17.2 Schwarz Minimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 697
18 Helmholtz Scattering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
18.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
18.2 Non-Overlapping and Overlapping Subdomain Methods . . . . . . 701
18.3 Fictitious Domain and Control Formulations . . . . . . . . . . . . . . . . 704
18.4 Hilbert Uniqueness Method for Standing Waves . . . . . . . . . . . . . 705
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761

1
Decomposition Frameworks

In this chapter, we introduce and illustrate several principles employed in
the formulation of domain decomposition methods for an elliptic equation. In
our discussion, we focus on a two subdomain decomposition of the domain
of the elliptic equation, into overlapping or non-overlapping subdomains, and
introduce the notion of a hybrid formulation of the elliptic equation. A hybrid
formulation is a coupled system of elliptic equations which is equivalent to
the original elliptic equation, with unknowns representing the true solution
on each subdomain. Such formulations provide a natural framework for the
construction of divide and conquer methods for an elliptic equation. Using
a hybrid formulation, we heuristically illustrate how novel divide and con-
quer iterative methods, non-matching grid discretizations and heterogeneous
approximations can be constructed for an elliptic equation.
We illustrate four alternative hybrid formulations for an elliptic equation.
Each will be described for a decomposition of the domain into two subdomains,
either overlapping or non-overlapping. We shall describe the following:
• Schwarz formulation.
• Steklov-Poincar´e formulation.
• Lagrange multiplier formulation.
• Least squares-control formulation.
For each hybrid formulation, we illustrate how iterative methods, non-matching
grid discretizations and heterogeneous approximations can be formulated for
the elliptic equation based on its two subdomain decomposition. In Chap. 1.1,
we introduce notation and heuristically describe the structure of a hybrid for-
mulation. Chap. 1.2 describes a two subdomain Schwarz hybrid formulation,
based on overlapping subdomains. Chap. 1.3 describes the Steklov-Poincar´e
formulation, based on two non-overlapping subdomains. The Lagrange mul-
tiplier formulation described in Chap. 1.4 applies only for a self adjoint and
coercive elliptic equation, and it employs two non-overlapping subdomains.
Chap. 1.5 describes the least squares-control formulation for a two subdo-
main overlapping or non-overlapping decomposition.

2 1 Decomposition Frameworks

1.1 Hybrid Formulations
Given a subdomain decomposition, a hybrid formulation of an elliptic equation
is an equivalent coupled system of elliptic equations involving unknowns on
each subdomain. In this section, we introduce notation on an elliptic equation
and heuristically describe the structure of its two subdomain hybrid formu-
lation. We outline how divide and conquer iterative methods, non-matching
grid discretizations, and heterogeneous approximations can be constructed for
an elliptic equation, using an hybrid formulation of it. Four commonly used
hybrid formulations are described in Chap. 1.2 through Chap. 1.5.

1.1.1 Elliptic Equation
We shall consider the following 2nd order elliptic equation:

L u ≡ −∇ · (a(x)∇u) + b(x) · ∇u + c(x) u = f, in Ω
(1.1)
u = 0, on ∂Ω,
for Ω ⊂ IRd . The coefficient a(x) will be assumed to satisfy:
0 < a0 ≤ a(x), ∀x ∈ Ω,
while b(x) and c(x) ≥ 0 will be assumed to be smooth, and f (x) ∈ L2 (Ω).
Additional restrictions will be imposed on the coefficients as required.

1.1.2 Weak Formulation
A weak formulation of (1.1) is typically obtained by multiplying it by a suffi-
ciently smooth test function v(x) and integrating the diffusion term by parts
on Ω. It will seek u ∈ H01 (Ω) satisfying:


⎪ A(u, v) = F (v), ∀v ∈ H01 (Ω), where
⎨ 
A(u, v) ≡ Ω (a(x) ∇u · ∇v + (b(x) · ∇u) v + c(x) u v) dx (1.2)

⎪ 

F (v) ≡ Ω f v dx,
where the Sobolev space H01 (Ω) is formally defined as below [NE, LI4, JO2]:
 
H01 (Ω) ≡ v ∈ H 1 (Ω) : v = 0 on ∂Ω ,
while the space H 1 (Ω) is defined as:
1  
H (Ω) ≡ v ∈ L2 (Ω) : v21,Ω < ∞ , where


v21,Ω ≡ Ω v 2 + |∇v|2 dx,

for ∇v ≡ ∂x ∂v
1
, . . . , ∂xd . The bilinear form A(., .) will be coercive if:
∂v

A(u, u) ≥ α u21,Ω , ∀ v ∈ H01 (Ω),
for some α > 0 independent of u. Coercivity of A(., .) is guaranteed to hold
by the Poincar´e-Freidrichs inequality, see [NE].

1. Ω1 ∩ Ω2 = ∅. .1 Hybrid Formulations 3 1. Boundaries of the subdomains will be denoted Bi ≡ ∂Ωi∗ and their interior and exterior segments by B (i) ≡ ∂Ωi∗ ∩ Ω and B[i] ≡ ∂Ωi∗ ∩ ∂Ω. JO2. see Fig. BR].1) will yield the system: A u = f. 1. 1.2. Boundaries of the subdomains will be denoted ∂Ωi and their interior and ex- terior segments by B (i) ≡ ∂Ωi ∩ Ω and B[i] ≡ ∂Ωi ∩ ∂Ω. Two subdomain decompositions .3 Discretization A finite element discretization of (1. 2 will be referred to as an overlapping decomposition of Ω if the following holds: Ω1∗ ∪ Ω2∗ = Ω.4 Subdomain Decompositions We shall employ the following notation. 1. If {φ1 . φn } forms a basis for Vh ∩H01 (Ω). see [ST14. respectively.1. BR28.1. .1. A collection of two open subregions Ωi∗ ⊂ Ω for i = 1. 2 will be referred to as a non-overlapping decomposition of Ω if the following hold:  Ω 1 ∪ Ω 2 = Ω. then the finite element discretization of (1. φj ) for 1 ≤ i. Definition 1.1) is obtained by Galerkin approximation of (1. .1. respectively. Let Th (Ω) denote a triangulation of Ω with elements of size h and let Vh denote the space of continuous piecewise linear finite element functions on Th (Ω). Definition 1.2). Non-overlapping subdomains Overlapping subdomains HH Ω2  Ω2∗ H       Ω1  Ω1∗        H  HH  Fig. 1. . j ≤ n and f i = F (φi ) for 1 ≤ i ≤ n. A collection of two open subregions Ωi ⊂ Ω for i = 1. CI2. We will denote the common interface by B ≡ ∂Ω1 ∩ ∂Ω2 . where Aij = A(φi .

in Ω \Ω i (1.1.5). where B (i) ≡ (∂Ωi∗ ∩ Ω). To obtain a smooth function χi (x). where h0 denotes the diameter of each subdomain Ωi∗ . In applications. in Ω.4 1 Decomposition Frameworks Remark 1. Heuristically. Each χi (x) will be discontinuous across B = ∂Ω1 ∩ ∂Ω2 .5) d1 (x) + d2 (x) By construction. if x ∈ Ω i . in Ω \Ω i (1. be constructed from a nonoverlapping subdomain Ωi by extending it to include all points in Ω within a distance β > 0 of Ωi . nonnegative. and satisfy the desired properties. in Ω i ∗ χi (x) = 0.1). in Ω i χi (x) = 0. a decomposition of Ω into subdomains can be chosen based either on the geometry of Ω or on the regularity of the solution u (if known). Each χi (. if x ∈ Ω i di (x) = ∗ (1.4. each di (x) may first be mollified. for 1 ≤ i ≤ 2. a hybrid formulation of (1.5 Partition of Unity A partition of unity subordinate to the overlapping subdomains Ω1∗ and Ω2∗ consists of smooth functions χ1 (x) and χ2 (x) satisfying: ⎧ ∗ ⎨ χi (x) ≥ 0. each di (x) will be continuous. Given a non-overlapping decomposition Ω1 and Ω2 of Ω. formally define: di (x) χi (x) ≡ . (1. Let di (x) denote the distance function:  ∗ dist x.) may be non-zero on B[i] .4) 0. we shall sometimes employ a discontinuous partition of unity satisfying: ⎧ ⎨ χi (x) ≥ 0. in Ω. In applications. is a coupled system of partial differential equations .6 Hybrid Formulation Let Ω1 and Ω2 (or Ω1∗ and Ω2∗ ) form a decomposition of a domain Ω. a continuous partition of unity subordinate to Ω1∗ and Ω2∗ can be computed as follows. yielding uniform overlap.6) ⎩ χ1 (x) + χ2 (x) = 1. An overlapping subdomain Ωi∗ can. Such a partition of unity may be constructed using di (x) = 1 on Ω i in (1. see [ST9].1. if desired.3.3) ⎩ χ1 (x) + χ2 (x) = 1. 1. Then. each χi (x) may be re- quired to satisfy a bound of the form |∇χi (x)| ≤ C h−1 0 . with support in ∗ Ω i . B (i) . Then. 1. Remark 1.

on each subdomain Ωi (or Ωi∗ ). 2. it must depend continuously on the data. Such coupling must ensure consistency and well posedness. for Dirichlet boundary conditions Ti (wi .8) Here ni denotes the unit exterior normal to B (i) and γ(·) denotes a coefficient function in the Robin boundary condition. Local Problems. a hybrid formulation consists of a local problem posed on each individual subdomain. with one unknown function wi (x). i. γ) = gi .1). The choice of the boundary operator Ti (wi . 1. using a partition of unity χ1 (x) and χ2 (x) appropriate for the subdomains. on Ωi (or Ωi∗ ) T (w . or new variables may be introduced to couple the local problems. (u1 (x).. Typically. (1. Second. and furthermore. the restriction ui (x) of the true solution u(x) of (1.7) by choosing gi (.. however. The second requirement ensures that the hybrid formulation is stable and uniquely solvable. a global functional may be employed. (1. The latter is essential for the stability of a numerical approximation of the hybrid formulation. the solution u(x) of (1. representing the local solution. γ) denotes a boundary operator which enforces either Dirichlet. In some hybrid formulations. u2 (x)) must solve the hybrid formulation. On each subdomain Ωi (or Ωi∗ ). The boundary data gi (. 2. i. Typically.1). Two requirements must be satisfied. the hybrid formulation must be well posed as a coupled system.1 Hybrid Formulations 5 equivalent to (1. yielding wi (x) = ui (x) for i = 1.1) to each subdomain Ωi (or Ωi∗ ) must solve the hybrid system.1) can be expressed in terms of the local solutions wi (x) as: u(x) = χ1 (x) w1 (x) + χ2 (x) w2 (x). whose optima is sought. w2 (x)) must exist and be unique.e. First. Neumann or Robin boundary conditions on B (i) : ⎧ ⎪ ⎨ wi . a hybrid formulation will require wi (x) to solve the original partial differential equation (1. γ) = ni · (a(x)∇wi ) for Neumann boundary conditions ⎪ ⎩ ni · (a(x)∇wi ) + γ wi for Robin boundary conditions. The first requirement ensures that the hybrid formulation is consistent with the original problem (1. γ) may differ with each hybrid formulation. fi (x) is f (x) restricted to Ωi (or Ωi∗ ).1): ⎧ ⎨ Lwi = fi . on B (i) for i = 1. on B[i] where Ti (w1 . Matching conditions couple the different local prob- lems (1.e. Once the hybrid system is solved.) to ensure that the hybrid formulation is equiv- alent to (1. its solution (w1 (x).) applied to the solution on the adjacent domain. matching conditions are equations satisfied by the .7) ⎩ i i wi = 0. along with matching conditions that couple the local problems.1). Matching Conditions. it may also be a control or a Lagrange multiplier function which couples the local problems.) typically corresponds to Ti (. Typically.

Once a hybrid formulation con- sisting of local equations of the form (1. the hybrid formulation may be derived as a saddle point problem (Chap.4 or Chap. g2 ) = 0. overlapping case or they may be differential constraints. 1. For an elliptic equation. on ∂Ωi∗ ∩ Ωj∗ . as in the preceding constraints. as suggested by elliptic regularity theory [GI]. non-overlapping case ni · (a(x)∇ui ) − ni · (a(x)∇uj ) = 0. Such equations specify gi (. Well Posedness of the Hybrid Formulation. satisfying. for C > 0 independent of the data. (1.1) by directly applying traditional relaxation. where  ·  and | · | are appropriately chosen norms for the solution and data.9) for suitably chosen operators Hi (·) on the interface B (i) . the bound: (w1  + w2 ) ≤ C (|f1 | + |f2 | + |g1 | + |g2 |) . the global solution u(. such as the requirement of continuity of the local solutions ui (x) and uj (x) across adjacent subdomains: ui − uj = 0.9) has been formulated and solved. we require that the hybrid formulation be well posed [SM7. In the lat- ter case. descent or . g1 . Reconstruction of the Global Solution. or indirectly through the use of intermediary variables such as Lagrange multipliers. Matching conditions may be enforced either directly.6 1 Decomposition Frameworks true solution u(x) restricted to the interfaces or regions of overlap between adjacent subdomains.). We shall express general matching conditions in the form: Hi (w1 .10) where χi (x) is a (possibly discontinuous) partition of unity subordinate to the subdomains Ω 1 and Ω 2 (or Ω1∗ and Ω2∗ ). on ∂Ωi∗ ∩ Ωj∗ . on ∂Ωi ∩ ∂Ωj . w2 . EV]. (1. non-overlapping case ui − uj = 0. Iterative Methods. Other differential constraints may also be employed using linear combinations of the above algebraic and differential constraints. Domain decomposition iterative algorithms can be for- mulated for solving (1. these may be either algebraic equations. such as continuity of the local fluxes: ni · (a(x)∇ui ) + nj · (a(x)∇uj ) = 0. To ensure that the hybrid formulation is solvable and that it may be approximated numerically by stable schemes.7) for 1 ≤ i ≤ 2 together with equa- tions of the form (1. overlapping case where ni denotes the unit exterior normal to ∂Ωi . for 1 ≤ i ≤ 2.) may be represented in the form: u(x) = χ1 (x) w1 (x) + χ2 (x) w2 (x). 10) of an associated constrained optimization problem. on ∂Ωi ∩ ∂Ωj .

w2 .2. g2 ) = 0. see Fig. using the current iterates on the other subdomains. a descent or saddle point algorithm can be employed. namely. the the local problems and the matching conditions. replacing Ti (wi . Discretization on a Nonmatching Grid.1) on a non-matching triangulation on Ωi (or Ωi∗ ). on B[i] . Non-overlapping subdomains Overlapping subdomains r r r r rHr r r r r r  Hr r r Ω2 r Ω2∗ r    H r r r  Hr r r   b b b r br rb r b b b b  b H Hbr r b b b b b brH r rb  b  b b b b b b b b b b b b H b rHrbr b b b ∗ Ω1 b b b b b b b b rH br r b  b b b bΩ 1 b b  b HrbH r b b b b b b b b b b b bHr rb  b  rb b b b H r r  r  r   H rH  r  r HH  r r r Fig. 1. j = 1. Each local discretization should be a stable scheme. Given current approximations of w1 . γ) = gi by either of the equations: Hj (w1 . • Let Thi (Ω) (or Thi (Ωi∗ )) denote independent triangulations of Ωi (or Ωi∗ ) with local grid sizes hi . These grids need not match on the region of intersection or overlap between the subdomains.2. The resulting grids. Heuristically. on Ωhi (or Ωh∗i ) T (w . will involve the following steps. • Each local problem in the hybrid formulation can be discretized as: ⎧ ⎨ Ahi whi = f hi . Nonmatching grids . In various applications.1) may be sought by directly discretizing the hybrid formulation.2.1 Hybrid Formulations 7 saddle point algorithms to a hybrid formulation. For instance. on B (i) ⎩ i i wi = 0. g1 . 1. 1. may not match on the regions of intersection between the subdomains. γ) = gi . on B (i) ⎩ hi hi hi whi = 0. 1. a global discretization of (1. Alternatively. g2 update for wi by solving: ⎧ ⎨ Lwi = fi . however. w2 . γ ) = ghi . and are referred to as nonmatching grids. on Ωi (or Ωi∗ ) T (w . 2. it may be of interest to independently triangulate different subregions Ωi (or Ωi∗ ) with grids suited to the geometry of each subdomain. each unknown wi may be updated sequentially using a relaxation procedure. see Fig. On such non-matching grids. the construction of a global discretization of equation (1. on B[i] . g1 .

x2 ). 1 ≤ i ≤ 2. Heterogenous Approximation.5. 1.11) will be approximately of elliptic character in Ω2 (or Ω2∗ ). We shall consider an advection dominated equation: − ∆u + b(x) · ∇u + c(x) u = f (x). efficient computational methods may be available for the local problems involved in an heterogeneous partial differential equation. where 0 <   1 is a small perturbation parameter.2 through 1. and in Chap. where: ⎧ ⎨ L u ≡ L0 u + L1 u L0 u ≡ −∆u ⎩ L1 u ≡ b(x) · ∇u + c(x)u. on ∂Ω. KE5. Depending f (x). care must be exercised in discretizing the matching conditions across the subdomain grids.11). which is of hyperbolic type for x1 > 0 and of elliptic type for x1 < 0. In such cases. Our discussion will be restricted to an elliptic-hyperbolic heterogeneous approximation of a singularly perturbed elliptic equation of heterogeneous character. In various applications. wh2 . there may be a subdomain Ω1 (or Ω1∗ ) on which:  |∆u|  |b(x) · ∇u + c(x)u|. it may be computationally advantageous to approximate elliptic equation (1.8 1 Decomposition Frameworks • The matching conditions should also be discretized: Hih (wh1 . then equation (1. An example is Tricomi’s equation [JO]: ux1 x1 − x1 ux2 x2 = f (x1 . If Ω2 (or Ω2∗ ) denotes a complementary (layer) region. A partial differential equation is said to be heterogeneous if its type changes from one region to another. gh2 ) = 0. Such issues are described in Chap. On Ω1 (or Ω1∗ ).11) u = 0. To obtain an heterogeneous approximation of (1. We refer to such models as heterogeneous approximations. . the restriction of elliptic equation Lu = f to the subdomain. in Ω (1. it may be of interest to approximate a partial differential equation of heterogeneous character by a partial differential equation of heterogeneous type. approximately satisfying L1 u = f . To ensure the stability and consistency of the global discretization of the hybrid formulation. will be of hyperbolic character. for x ∈ Ω1 (or Ω1∗ ). OM]. Motivated by singular perturbation methodology [LA5. 11.11) by an heterogeneous approximation involving an equation of mixed hyperbolic and elliptic character. gh1 .

H for i = 1. we describe the hybrid formulation underlying the Schwarz alternating method for a two subdomain overlapping decomposition of Ω. and we may approximate (1. Schwarz [SC5] in 1870. Ti and Hi (·). for i = 1. Although Schwarz’s motivation was to study the existence of harmonic functions on irregular regions.7) by: ⎧ ⎨L ˜ i vi = fi . we illustrate the formulation of iterative methods. Similarly. Often. on Ωi (or Ωi∗ ). respectively.2 through Chap. we may approximate (1. see Chap. on B[i] .1). In this section. Using the hybrid formulation. . g˜2 ) = 0. the hybrid formulation underlying Schwarz’s iterative method. 2 ˜ i . 1. Care must be exercised in the selection of approximations since each local problem must be well posed.9) by: ˜ i (v1 . if we choose L ˜ 2 u = Lu on Ω2 (or Ω ∗ ) then the local problem on Ω2 (or Ω ∗ ) will be 2 2 elliptic and Dirichlet boundary or flux boundary conditions can be employed on B (2) and B[2] .5. and heterogeneous approximations for elliptic equation (1. obtained by formally omitting  ∆u on Ω1 (or Ω1∗ ). with vi (x) ≈ wi (x). T˜i and H where L ˜ i (·) are heuristic local approximations of Li .5 and Chap.3 and Chap. 2 ⎩ i i ˜ vi = 0. We refer the reader to Chap. and let B (i) = ∂Ωi∗ ∩Ω and B[i] = ∂Ωi∗ ∩ ∂Ω denote the interior and exterior boundary segments of Ωi∗ . T˜ (v .2 Schwarz Framework The framework that we refer to as the Schwarz hybrid formulation is based on the earliest known domain decomposition method. γ) = g˜i . g˜1 . 1. which solves Laplace’s equation on an irregular domain that is the union of regular regions (such as rectangular and circular regions). formulated by H. non-matching grid discretizations. For instance. For instance. We let Ω1∗ and Ω2∗ denote the overlapping subdomains. and it enables the formulation of other divide and conquer approximations. 1. 1. and we must replace Dirichlet boundary conditions on B (1) and B[1] by inflow boundary conditions. approximate matching conditions for a heterogeneous problem can also be derived heuristically by a vanishing viscosity approach. now referred to as the Schwarz alternating method. see Fig. applies to a wider class of elliptic equations. v2 . and the global coupled system must also be well posed. if we define L ˜ 1 u = L1 u on Ω1 (or Ω ∗ ) then 1 the local problem will be hyperbolic. 12 for specific examples. 1.2 Schwarz Framework 9 we may approximate its hybrid formulation based on Ωi (or Ωi∗ ) for 1 ≤ i ≤ 2.3. Remark 1. A. 1. Schwarz formulated an iterative method. 12. on B ˜ (i) .

on B[2] (1.1) and w1 (x) ≡ u(x) in Ω1∗ and w2 (x) ≡ u(x) in Ω2∗ .3. Define wi (x) = u(x) on Ωi∗ for 1 ≤ i ≤ 2. on B (1) ⎪ ⎩ (2) 1 2 w2 = w1 . on Ω 2 . then w1 (x) and w2 (x) will satisfy (1. Suppose the following assumptions hold. Proof.12) by construction. suppose that w1 (x) and w2 (x) satisfy (1. Let c(x) ≥ 0 and ∇ · b(x) ≤ 0. 3. if the above coupled. note that w1 (x)−w2 (x) .2. Then the following result will hold: ∗ w1 (x). It will therefore hold that: ⎧ ∗ ⎧ ∗ ⎪ ⎨ Lw1 = f. Then. on B[1] w2 = 0. To this end. is well posed.10 1 Decomposition Frameworks Ω2∗ Ω1∗ B (1) B[2] (2) B B[1] Fig. To prove the converse. We will first show that w1 (x) = w2 (x) on Ω1∗ ∩Ω2∗ .12). in Ω2 w1 = 0. Furthermore. on B (2) ⎪ ⎩ ⎪ ⎩ w1 = 0. 2.6. 2.1 Motivation To derive the hybrid formulation underlying Schwarz’s method.1).1). decomposed system for w1 (x) and w2 (x). in Ω1 ⎪ ⎨ Lw2 = f. on Ω 1 u(x) = ∗ w2 (x). We have the following uniqueness result. on B[1] and w2 = 0. by construction L wi = f in Ωi∗ . Theorem 1. in Ω1∗ ⎪ ∗ ⎨ ⎨ Lw2 = f.12) ⎪ ⎩ w = w . Let u(x) denote a sufficiently smooth solution of equation (1. the original solution can be recovered with u(x) = wi (x) on Ωi∗ for i = 1. Importantly. Let w1 (x) and w2 (x) be sufficiently smooth solutions of the following sys- tem of coupled elliptic equations: ⎧ ⎧ ⎪ Lw1 = f. Boundary segments for an overlapping decomposition 1. in Ω2 w1 = w2 . If u(x) is a solution of equation (1. 1. on B . the continuity of u will yield matching of w1 and w2 on Ω1∗ ∩ Ω2∗ . on B[2] . 1. on B (1) and w2 = w1 . let u(x) denote the solution of (1. then by solving it.

By uniqueness of L-harmonic functions for c(x) ≥ 0 and ∇ · b(x) ≤ 0.1) and (1. in Ω1∗ ⎪ ⎨ Lw ˜2 = f˜2 . Now let χ1 (x) and χ2 (x) denote a sufficiently smooth partition of unity subordinate to the cover Ω1∗ and Ω2∗ . it will follow that w1 (x) − w2 (x) = 0 in Ω1∗ ∩ Ω2∗ .   Remark 1. 1. on B (2) . in Ω2∗ w ˜1 = 0.8. however. If we define u(x) = χ1 (x) w1 (x) + χ2 (x) w2 (x). The latter requires that the perturbed system: ⎧ ⎧ ⎪ ⎨ Lw˜1 = f˜1 .12). on B[2] (1.7. Additionally. The preceding theorem yields equivalence between sufficiently smooth solutions to (1.12). This yields that w1 (x) = w2 (x) on Ω1∗ ∩ Ω2∗ .13) ⎪ ⎩ ⎪ ⎩ ˜1 = w2 + r˜1 . on B[1] and w ˜2 = 0. The above result suggests that given a partition of unity χ1 (x) and χ2 (x) subordinate to Ω1∗ and Ω1∗ . w be uniquely solvable and satisfy a bound of the form: . Remark 1.1) and (1. not a result on the well posedness (stability) of formulation (1. a solution to elliptic equa- tion (1. by construction w1 (x) − w2 (x) will be L-harmonic. on B (1) w ˜2 = w1 + r˜2 .12) and defining: u(x) = χ1 (x) w1 (x) + χ2 (x) w2 (x). respectively. since w1 = w2 in Ω1∗ ∩ Ω2∗ and since Lwi = f in Ωi∗ .1) may be obtained by solving (1. It is. This yields an equivalence between (1. then u(x) will satisfy (1.12) under perturbations of its data.2 Schwarz Framework 11 has zero boundary conditions on ∂ (Ω1∗ ∩ Ω2∗ ).1).

The resulting algorithm is the Schwarz alternating method. BA2] and [MI. and can be motivated heuristically using the block structure of (1. It is robustly convergent for a large class of elliptic equations. . BR18]. 15 for maximum norm well posedness. in Ω1∗ ⎪ ⎨ Lw2 (k+1) = f. MO2. in appropriate norms. MA37.12).2 Iterative Methods The iterative method proposed by H. It is sequential in nature and summarized below. See Chap. LI6. on B and w2 = w1 . it can (k) be updated by solving the block equation of (1. on B[1] w2 = 0. in Ω2∗ (k+1) (k) (1) (k+1) (k+1) w1 = w2 . (|w ˜2 |) ≤ C f˜1  + f˜2  + ˜ ˜1 | + |w r1  + ˜ r2  . on B (2) ⎪ ⎩ (k+1) ⎪ ⎩ (k+1) w1 = 0.2. A. see [SO. 1.12) posed on subdomain Ωi∗ with boundary conditions w1 = w2 on B (1) or w2 = w1 on B (2) approximated by the current iterate on its adjacent subdomain: ⎧ ⎧ ⎪ ⎨ Lw1 (k+1) = f. DR11. If wi denotes the k’th iterate on subdomain Ωi∗ . Schwarz is a very popular method for the solution of elliptic partial differential equations. on B[2] . LI7.

4. new iterates are computed as follows. referred to as the additive Schwarz algorithm [DR11]. respectively. Under suitable assumptions on the coefficients of the elliptic equation and overlap amongst the subdomains Ωi∗ . For k = 0. 1. Below. Then. see [DR11. 1. Solve for w1 as follows: ⎧ ⎪ ⎨ Lw1 (k+1) = f1 . see Chap. Solve for w2 as follows: ⎧ ⎪ ⎨ Lw2 (k+1) = f2 . on B[1] . on Ω2∗ (k+1) w2 v (k+1) ≡ v (k+1/2) .2. we describe an unaccelerated parallel Schwarz algorithm which requires the concurrent solu- tion of subdomain problems. 2. Let wi denote the (k) ∗ k’th iterate on Ωi for 1 ≤ i ≤ 2.12 1 Decomposition Frameworks Algorithm 1. MA33.9.1 (Schwarz Alternating Method) Let v (0) denote the starting global approximate solution. requiring the solution of one subdomain problem prior to another. CA19. It is motivated by a popular parallel method. . to the entire domain Ω.1). the iterates v (k) converge geometrically to the true solution u of (1. in Ω1∗ (k+1) w1 = v (k) . which is employed typ- ically as a preconditioner. TA5]. on Ω \Ω2∗ .5 when b(x) = 0. on B (2) Define v (k+1) as follows:  . on Ω \Ω1∗ . Define v (k+1/2) as follows:  . (k+1) 3. Endfor Output: v (k) 1 Remark 1. · · · . on B (1) ⎪ ⎩ (k+1) w1 = g. until convergence do: (k+1) 2. FR8. The iterates v (k+ 2 ) and v (k+1) in the preceding algorithm are (k+1) (k+1) continuous extensions of the subdomain solutions w1 and w2 . on Ω1∗ (k+1) w1 v (k+1/2) ≡ (k) v . in Ω2∗ (k+1) w2 = g. The preceding Schwarz algorithm is sequential in nature. The algorithm we describe is based on a partition of unity χ1 (x) and χ2 (x) subordinate to the overlapping subdomains Ω1∗ and Ω2∗ . on B[2] ⎪ ⎩ (k+1) w2 = v (k+1/2) .

however. 1. as in Fig. 1. 1.12).12) based on a two subdomain decomposition of Ω.1). 2. For details.3 Global Discretization An advantage of the hybrid formulation (1. may be nonconforming along the internal boundaries B (i) of the subdomains. using finite difference schemes on the subdomains. We triangulate each subdomain Ωi∗ for 1 ≤ i ≤ 2 by a grid Thi (Ωi∗ ) of size hi as in Fig. 1. 3. see Chap.4. The additive Schwarz method [DR11] is also introduced there.12) is that novel discretizations of (1. In practice. In Chap. Remark 1. Endfor (k) (k) Output: (w1 .2.2 Schwarz Framework 13 Algorithm 1. 11. Matrix versions of Schwarz algorithms are described in Chap. on B (i) ⎪ ⎪ ⎩ (k+1) wi = 0. For i = 1. 2 determine wi in parallel: ⎧ ⎪ ⎪ (k+1) in Ωi∗ ⎨ Lwi = f.1). There the multisubdomain case is considered. w2 ) If c(x) ≥ c0 > 0 and there is sufficient overlap. we block partition the .4. For k = 0. On each subdomain. given a discretization of (1. corresponding to a generalized block Jacobi method. 1. 1. 15. Endfor 4. we outline the construction of a global finite difference discretiza- tion of (1. the iterates v (k) defined by: (k) (k) v (k) ≡ χ1 (x) w1 (x) + χ2 (x) w2 (x). (k+1) (k) (k) wi = χ1 (x) w1 (x) + χ2 (x) w2 (x).2 (Parallel Partition of Unity Schwarz Method) (0) (0) Let w1 . on B[i] . 2 it is observed that the matrix version of the Schwarz alternating method corresponds to a generalization (due to overlap) of the traditional block Gauss-Seidel iterative method. resulting in a possibly non-matching grid. The resulting solution. Furthermore. · · · . each local problem may be discretized using tradi- tional techniques suited to the local geometry and properties of the solution. see Fig.1) may be obtained by discretizing (1. Below. see Chap. 1. and coarse space correction is introduced.4.2. w2 denote starting local approximate solutions. will converge geometrically to the solution u of (1. Each subdomain Ωi∗ may be independently triangulated. The local triangulation can be suited to the geometry and regularity of the solution on Ωi∗ . until convergence do: (k+1) 2. and care must be exercised in discretizing the matching conditions to ensure that the global discretization is stable. which is essential for robust convergence. discrete versions of the above algorithms must be applied.10.

Nonmatching overset grids local discrete solution whi on Thi (Ωi∗ ) as: . 1.4.14 1 Decomposition Frameworks Th2 (Ω2∗ ) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q a a a a a aqq q aq q q qa q q q qa q q aq q q q q q q a a a a a aq q aq qa q qa q aq q q a a a a a aqq q aq q q qa q q q qa q q aq q q q q q q a a a a a aqq q aq q q qa q q q qa q q aq q q q q q q a a a a a aq q aq qa q qa q aq q q a a a a a a a a a a a a a a a a a a a a Th1 (Ω1∗ ) Fig.

on each boundary segment B (i) . wB (i) . By (i) assumption on the boundary values of whi on B[i] . for i = 1.14) ⎪ (2) (2) (2) (2) ⎪ AII wI + AIB (2) wB (2) = f h2 . for 1 ≤ i ≤ 2. wB[i] . then this discretization step would be trivial. 2 discretize the elliptic equation Lwi = fi on Ωi∗ by employing a stable scheme on Thi (Ωi∗ ) and denote the discretization as: (i) (i) (i) (i) AII wI + AIB (i) wB (i) = f hi . respectively. . Next. The global discretization now will have the following block matrix form: ⎧ ⎪ ⎪ (1) (1) (1) (1) AII wI + AIB (1) wB (1) = f h1 . discretize the inter-subdomain matching conditions w1 = w2 on B (1) and w2 = w1 on B (2) by applying appropriate interpolation stencils or by discretizing its weak form. Here Ihh21 will denote a matrix of size m1 × (n2 + m2 + l2 ) and Ihh12 will denote a matrix of size m2 × (n1 + m1 + l1 ). for nonmatching grids care must be exercised to ensure stability of the global scheme. If interpo- (1) lation stencils are employed. 2 corresponding to the grid points in the interior and the boundary segments B (i) and B[i] . for i = 1. then the value wh1 (x) at a grid point x on Bh1 may be expressed as a weighted average of nodal values of wh2 (·) on the grid points of Ωh∗2 . on B (i) and B[i] . mi and li denote the number of grid points of triangulation Thi (Ωi∗ ) in the interior of Ωi∗ . B (2) h1 h1 This algebraic system can be solved by the Schwarz alternating method. ⎪ ⎪ ⎪ ⎩ (2) w = I h2 w . ⎪ ⎪ ⎪ ⎨ (1) wB (1) = Ihh21 wh2 (1. respectively. However. Let ni . If the local grids match on each segment B[i] . T (i) (i) (i) whi = wI . We denote the discretized matching conditions as: (1) (2) wB (1) = Ihh21 wh2 and wB (2) = Ihh12 wh1 . it will hold that wB[i] = 0. Next.

the singularly perturbed elliptic equation may be approximately of hyperbolic character on some subregions and of elliptic character elsewhere.2 Schwarz Framework 15 Remark 1. ⎪ ⎩ (1) w1 = w2 . KE5].2. however. where. Here. If c(x) ≥ c0 > 0 and the local discretizations satisfy a discrete maximum principle. Motivated by singular perturbation theory [LA5. on Bin . . on ∂Ω. on B[1].in ≡ {x ∈ B[1] : b(x) · n(x) < 0} (1) Bin ≡ {x ∈ B (1) : b(x) · n(x) < 0}. In this case. then the above discretization can be shown to be stable and convergent of optimal order in the maximum norm. 11.in . see Chap. 1. in Ω (1. Suppose the overlapping subdomain Ω1∗ can chosen such that: ∗ |∆u(x)|  |b(x) · ∇u(x) + c(x) u(x)| for x ∈ Ω 1 . and if the overlap is sufficiently large so that a certain contraction property holds. the Dirichlet boundary value problem on Ω1∗ must be replaced by suitable inflow boundary conditions. where 0 <   1 is a perturbation parameter. we illustrate the construction of an elliptic- hyperbolic approximation of an advection dominated elliptic equation:   L u ≡ − ∆u + b(x) · ∇u + c(x) u = f. motivating a heterogeneous approximation.12). Then. in Ω1 w1 = 0. a global heterogeneous approximation of the singularly perturbed equation (1. To ensure well posedness of the local subproblems.15) u = 0.4 Heterogeneous Approximation A heterogeneous approximation of a partial differential equation is a model system of partial differential equations in which the problems posed on dif- ferent subdomains are not all of the same type. the inflow boundary segments are defined by: B[1]. 1.11. depending on the solution u. Such approximations may be useful if there is a reduction in computational costs resulting from the use of a heterogeneous model. on Ω1∗ the term L u may be approximated by L0 u defined by: L0 u ≡ b(x) · ∇u + c(x)u. due to the hyperbolic nature of L0 w1 = f1 : ⎧ ∗ ⎪ ⎨ L0 w1 = f1 .15) may be sought by replacing the elliptic equation L w1 = f1 on Ω1∗ by the hyperbolic equation L0 w1 = f1 within the Schwarz hybrid formulation (1. if the inter-grid interpolations Ihh12 and Ihh21 are convex weights.

The resulting global heterogeneous approximation will be: ⎧ ⎧ ⎪ ⎪ L0 w1 = f1 .in and w2 = 0. on B (1. 1.3. AG. The underlying principle states that across any interface within a conduct- ing medium. on Bin w2 = w1 . in Ω2∗ ⎨ ⎨ w1 = 0. em- ployed in the study of electric fields in conductors [PO.17) u = 0. ST8. Remark 1. while the second condition requires the local fluxes n1 · (a∇w1 − b w1 ) and n1 · (a∇w2 − b w2 ) associated with w1 and w2 to also match on B. as well as bounds on the error resulting from such approximation are discussed in Chap. see Chap.16) ⎪ ⎪ ⎪ ⎪ ⎩ (1) ⎩ w1 = w2 .12. 2. LE12. on B (2) . 12. will hold on the interface B for smooth solutions: w1 = w2 . Ω2 denote a non-overlapping decomposition of Ω. on ∂Ω. be continuous. Let ni (x) denote the unit outward normal vector to ∂Ωi at the point x ∈ B. The mathematical version of this principle suggests a hybrid formulation for a 2nd order elliptic equation given a two subdomain non-overlapping decomposition of its domain.1) posed on Ω:  L u ≡ −∇ · (a(x) ∇u) + b(x) · ∇u + c(x) u = f.5. as in Fig. the electric potential as well as the flux of electric current must match.3 Steklov-Poincar´ e Framework The hybrid formulation that we refer to as the Steklov-Poincar´e framework is motivated by a principle in physics referred as a transmission condition.e. in Ω1∗ ⎪ ⎪ Lw2 = f2 . This heterogeneous system can be discretized. QU5]. 15.18) n1 · (a∇w1 − b w1 ) = n1 · (a∇w2 − b w2 ) . separated by an interface.16 1 Decomposition Frameworks where n(x) denotes the exterior unit normal to ∂Ω1∗ at x. and the resulting algebraic system can be solved by the Schwarz alternating method. on B[1]. For i = 1. the following transmission conditions. Then. i. Well posedness of this heterogeneous system. in Ω (1. which are derived later in this section. on B. The first condition requires the subdomain solutions w1 and w2 to match on B. on B[2] (1.1 Motivation Consider elliptic equation (1. with interface B = ∂Ω1 ∩ ∂Ω2 separating the two subdomains and B[i] ≡ ∂Ωi ∩ ∂Ω.. 1. . Let Ω1 . denote the solution on each subdomain Ωi by wi (x) ≡ u(x). 1.

GA14. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 − w2 = 0. since w1 (x) = w2 (x) on B and since n1 (x) = −n2 (x) on B. In this section. AC7.1) is continuous. on B. see [QU6.5. If the coefficient b(x) in elliptic equation (1. the following equivalent flux transmission condition is preferred in several domain decomposition methods:     1 1 n1 · a∇w1 − b w1 + n2 · a∇w2 − b w2 = 0. . 2 2 for continuous b(x).1): ⎧ ⎪ ⎪ Lw1 = f.18).1). on B ⎪ ⎩ 1 1 1 2 2 w2 = 0.13. in Ω2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n · (a∇w − b w ) + n · (a∇w − b w2 ) = 0. on B[1] ⎪ ⎪ Lw2 = f. In particular. we shall outline how this hybrid formulation can be employed to formulate novel domain decomposition iterative methods. A two subdomain non-overlapping decomposition Combining the transmission conditions with the elliptic equation on each subdomain. then the flux boundary condition may also be equivalently stated as: n1 · (a∇w1 ) + n2 · (a∇w2 ) = 0. by taking linear combinations of (1. 1. Equivalence of the Steklov-Poincar´e hybrid formulation is shown next. on B[2] .3 Steklov-Poincar´e Framework 17 Ω2 - B n1 Ω1 B[2] B[1] Fig. 1. RA3]. Remark 1. on B. on B ⎪ ⎪ ⎪ ⎨ w1 = 0. yields the following hybrid formulation equivalent to (1. discretization methods and heterogeneous approximations for (1.

By construction. w1 (x) = u(x). Then. 1. Proof.1). If v is chosen to be of compact support in Ω and not identically zero on B. Let w1 (x) and w2 (x) be smooth solutions of the following coupled system of partial differential equations: ⎧ ⎪ ⎪ Lw1 = f. Substituting that Lwi = f on Ωi . B yielding the result that n1 · (a∇w1 − b w1 ) = n1 · (a∇w2 + b w2 ) on B.14. for v ∈ C0∞ (Ω). To verify that the local fluxes match on B. in Ω1 ⎪ ⎪ ⎪ ⎪ w 1 = 0.1) with smooth coefficient b(x) and solution u. we will verify that (w1 .   Remark 1. it follows that:  n1 · (a∇w1 − b w1 − a∇w1 + b w1 ) v dsx = 0. on Ω 2 . i=1 Ωi i=1 Ωi for v ∈ C0∞ (Ω). the following result will hold. Lwi = f in Ωi and wi = 0 on B[i] .1) and wi ≡ u on Ω i . employ the following weak formulation of (1.15. on Ω 1 w2 (x) = u(x). in Ω2 ⎪ ⎪ ⎪ ⎪ w2 = 0. and express each integral on Ω as a sum of integrals on Ω1 and Ω2 .18 1 Decomposition Frameworks Theorem 1. on B (1.19) ⎪ ⎪ Lw2 = f.19). w2 ) solves (1. The above result only demonstrates the equivalence of solu- tions to both systems. on B[2] ⎪ ⎩ n1 · (a∇w2 − b w2 ) = n1 · (a∇w1 − b w1 ) . Suppose the following assumptions hold. then integration by parts yields: 2  i=1 Ωi −∇ · (a∇wi ) v + (b · ∇wi ) v + c wi v dx  2  − B n1 · (a∇w1 − b w1 − a∇w2 + b w2 ) v dsx = i=1 Ωi f v dx.19). Suppose u is a smooth solution to (1. It does not guarantee well posedness of hybrid for- mulation (1. Let L u be defined by (1. ∀v ∈ C0∞ (Ω). on B[1] ⎪ ⎨ w1 = w2 . to obtain: 2   2   (a∇wi · ∇v − wi ∇ · (b v) + c wi v) dx = f v dx. on B. 2. The converse can be verified analogously. This may be demonstrated using elliptic regularity theory in . By continuity of u (or an application of the trace theorem). we obtain that w1 = w2 on B.

S(g. corresponding to this choice of interface data g(·). in Ω1 ⎪ ⎨ Lw2 = f2 . As a result.19) may be reduced to the search for interface data g(·) which solves the Steklov-Poincar´e problem (1. f2 ) on g(·) will be affine linear. where X = H00 (B) for a standard subdomain decomposition and 1/2 X = H 1/2 (B) for an immersed subdomain decomposition. i. It will map the Dirichlet data g(·) on B to the jump in the local fluxes (Neumann data) across interface B using (1.20) will yield the solution to (1. on B.19).20).21) then. For such interface data g(·). f2 ) = 0. Importantly. on B n1 · (a∇w1 − b w1 ) = n1 · (a∇w2 − b w2 ) . we define a Steklov-Poincar´e operator S(g. .e. 1.20) ⎪ ⎩ ⎪ ⎩ w1 = g. on B. if X denotes the space of Dirichlet data on B.3 Steklov-Poincar´e Framework 19 appropriately chosen norms (however. the flux or Neumann data will belong to its dual space X  . w2 ) to (1. then the action of the Steklov-Poincar´e operator S(g. on B[1] and w2 = 0 on B[2] (1. (1. which represents hybrid formulation (1. w2 ) will solve (1. in Ω2 w1 = 0. referred to as a Steklov-Poincar´e operator. the local solutions (w1 . f1 .16. so that (w1 . the local solutions w1 (·) and w2 (·) to (1. given a solution (w1 . w2 = g.19) is well posed.21). where w1 (·) and w2 (·) are solutions to the following problems: ⎧ ⎧ ⎪ ⎨ Lw1 = f1 . We now introduce an operator. w2 ) to problem (1. If the local forcing terms f1 (·) and f2 (·) are nonzero. Definition 1.19) more compactly.20) will satisfy: w1 = w2 (= g). if an interface function g(·) can be found which yields zero jump in the flux across B.19) with g(x) = u(x) on B. w2 ) to (1. f2 ) ≡ n1 · (a∇w1 − b w1 ) − n1 · (a∇w2 − b w2 ) .19). thus yielding a solution u to (1. When a weak formulation is used. f1 . When system (1. f1 . the search for a solution (w1 .1). we shall omit this). we may define: w1 in Ω 1 u≡ w2 in Ω 2 . f2 ) as follows: S (g. Given sufficiently smooth Dirichlet boundary data g(·) on the interface B. on B. f1 .

f2 ) ≡ S (1) (g. for w1 and w2 defined by (1. then the system of equations posed on subdomain Ωi in (1. and let 0 < θ < 1 denote a relaxation parameter required to ensure convergence [BJ9. They are referred to as pseudo- differential operators.20). MA29]. .19) can be solved to yield updates (k+1) wi for the local solutions. FU. the Steklov-Poincar´e operator S may be expressed as the sum of two subdomain operators: S(g. if w1 and w2 denote the k’th iterates on subdomains Ω1 and Ω2 .19). For instance. global dis- cretizations and heterogeneous approximations can be constructed for the original problem (1. 1. and is referred to as a Dirichlet-Neumann algorithm as it requires the solution of Dirichlet and Neumann boundary value problems. and for the correct choice of Dirichlet interface data g(·) on B. f2 ). the maps S (i) are commonly referred to as lo- cal Dirichlet to Neumann maps. The resulting iterative algorithm sequentially enforces either the continuity or flux transmission boundary conditions on B. obtained by solution of the local problems (1.20 1 Decomposition Frameworks Remark 1. These Dirichlet to Neumann maps are not differential operators since the solutions wi to (1.3. Both S (1) and S (2) map the Dirichlet interface data g(·) pre- scribed on B to the corresponding Neumann flux data n1 · (a∇w1 − b w1 ) and n2 · (a∇w2 − b w2 ) on B.18. where S (1) (g. respectively. In the rest of this section. BR11. Remark 1.17. As a result. For computational purposes. each operator S (i) will require only subdomain information and will be affine linear.2 Iterative Methods The block structure of the Steklov-Poincar´e system (1.19) suggests various (k) (k) iterative algorithms for its solution. f1 ) + S (2) (g. f2 ) ≡ n2 · (a∇w2 − b w2 ) .20). In the following.1) using the Steklov-Poincar´e formulation (1. respectively. we outline how iterative methods. By definition. f1 ) ≡ n1 · (a∇w1 − b w1 ) S (2) (g. suppose that b(x) = 0 in Ω.20) have representations as integral operators acting on the data g. f1 . the jump in the Neumann data on B will be zero for the local solutions. with boundary conditions chosen using preced- ing iterates.

1. · · · . in Ω2 (k+1) . Solve for w1 as follows: ⎧ ⎪ ⎨ Lw1 (k+1) = f1 . Solve for w2 as follows: ⎧ (k+1) ⎪ ⎨ Lw2 = f2 . 1. on B ⎪ ⎩ (k+1) w1 = 0.3 Steklov-Poincar´e Framework 21 Algorithm 1. (k+1) 3. in Ω1 (k+1) (k) w1 = v2 .1 (Dirichlet-Neumann Algorithm) (0) (0) (0) Let v2 (where v2 ≡ w2 on B) denote a starting guess.3. For k = 0. 1. until convergence do: (k+1) 2. on B[1] .

. w2 = 0.

Below. Remark 1. DO18.20. 3. on B[2] ⎪ ⎩ n a∇w(k+1) = n a∇w(k+1) . the local solution w1 matches v2 on B (however. 5. and additional restrictions on the parameter 0 < θ < 1. Update: v2 = θ w2 + (1 − θ)v2 on B. Various algorithms have been proposed which solve subdomain problems in parallel. w2 ) (k+1) (k) Remark 1. on B. Multidomain matrix versions of such algorithms are described in Chap. DR18. Under restrictions on the coefficients (such as b(x) ≡ 0 and c(x) ≥ 0). see [FU. MA29]. We assume b(x) = 0. DE3. 2 2 2 1 (k+1) (k+1) (k) 4. RA3]. This step requires the solution of an elliptic equation on Ω1 with Dirichlet conditions on B[1] and B. . the iterates (k) wi in the Dirichlet-Neumann algorithm will converge geometrically to the true local solution wi of (1. In step 3. GA14. Endfor (k) (k) Output: (w1 . we de- scribe a two fractional step algorithm. DO13. MA14.19. QU6.19) as k → ∞. the local fluxes may not match on B). (k+1) (k+1) (k+1) the flux of w2 matches the flux of w1 on B (though w2 may not (k+1) match w1 on B). see [BO7. In step 2. This step requires the solution of an elliptic equation on Ω2 with Dirichlet conditions on B[2] and Neumann conditions on B. each step requiring the solution of subdomain problems in parallel [DO13. AC7. YA2]. The preceding Dirichlet-Neumann algorithm has sequential steps. A matrix formulation of this algorithm is given in Chap. 3.

3. Let 0 < θ. · · · . δ. For k = 0. 1. β.2 (A Parallel Dirichlet-Neumann Algorithm) (0) (0) Let w1 and w2 denote a starting guess on each subdomain. until convergence. 1. α < 1 denote relaxation parameters.22 1 Decomposition Frameworks Algorithm 1.

do: .

on B 2. in Ω1 ⎪ ⎪ ⎨ ⎨Lw2 1 = f. in Ω2 (k+ 12 ) (k+ 2 ) . (k+ 21 ) (k) (k) µ = θ n1 · a∇w1 + (1 − θ) n1 · a∇w2 . In parallel solve for w1 and w2 ⎧ ⎧ (k+ 12 ) (k+ 12 ) ⎪ ⎪ Lw1 = f. (k+ 12 ) (k+ 12 ) 3. Compute (k+ 12 ) (k) (k) g = δ w1 + (1 − δ) w2 . on B.

⎩ w(k+ 2 ) = g(k+ 12 ) . on B. 1 2 ⎧ . on B[2] ⎪ ⎪ 1 ⎪ ⎪ 1 ⎩n1 · a∇w(k+ 2 ) = µ(k+ 12 ) . w1 = 0. on B. on B[1] and w2 = 0.

.

In parallel solve for w1 and w2 ⎧ ⎧ (k+1) ⎪ (k+1) ⎨ Lw1 = f. (k+1) (k+1) 5. in Ω1 ⎪ ⎨ Lw2 = f. Compute ⎪ ⎪ (k+ 1 ) (k+ 1 ) ⎪ ⎪ g(k+1) = α w1 2 + (1 − α) w2 2 . on B[1] and . ⎪ ⎨ 1 2 on B 4. in Ω2 (k+1) (k+1) w1 = 0. ⎩ on B. ⎪ (k+ 12 ) (k+ 12 ) ⎪ ⎪ µ(k+1) = β n2 · a∇w + (1 − β) n2 · a∇w .

on B. α. w1 =g . a parallel algorithm. . referred to as a Robin-Robin algorithm can also be used [QU6. see [DO13. ˜i = 2 when i = 1 and ˜i = 1 when i = 2). and the relaxation parameters θ. DO18]. GA14. the Robin-Robin algorithm has the following form. For related parallel algorithms. β. AC7. ˜i will denote a complementary index to i (namely. Endfor (k) (k) Output: (w1 .21. 2 denote a local Robin boundary operator on B for i = 1. Let:   1 Φi (w) ≡ ni · a(x)∇w − b(x) w + zi (x) w. Then. When the advection coefficient b(x) = 0. For convenience. on B[2] ⎪ ⎩ (k+1) (k+1) ⎪ ⎩ n · a∇w(k+1) = µ(k+1) . on B. w2 ) Remark 1. 2 2 6. 2 for an appropriately chosen bounded interface function zi (x) > 0. this parallel algorithm will converge geometrically [YA2]. Under appropriate restrictions on the coefficients a(x) and c(x). w 2 = 0. RA3]. δ.

3. · · · .3 Steklov-Poincar´e Framework 23 Algorithm 1.3 (A Robin-Robin Algorithm) (0) (0) Let w1 and w2 denote a starting guess on each subdomain Let 0 < θ < 1 denote a relaxation parameter 1. 2 in parallel solve: ⎧ (k+1) ⎪ ⎨ Lwi = fi . 1. in Ωi (k+1) . 1. For i = 1. For k = 0. until convergence do: 2.

. i w = 0.

.

19) is that each subdomain Ωi can be independently triangulated. DO4] and in the context of spectral methods. see [QU6.6. 1. When (c(x)− 12 ∇·b(x)) ≥ β > 0. Nonmatching local grids . A potential advantage of discretizing (1. see Fig.6. 1.1). Endfor 4. care must be exercised in discretizing the transmission conditions so that the resulting global discretization is stable. AG2.19) can be used to construct a global discretization of (1. on B[i] ⎪ ⎩ Φ w(k+1) = θ Φ w(k) + (1 − θ) Φ w(k) . 1.19) using finite element methods. AC7. r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r b b b b b b b b b rb r r r r r r r r r r r r r r r r r r r b b b b b b b b b rb r r r r r r r r r r r r r r r r r r r b b b b b b b b b rb r r r r r r r r r b b b b b b b b b rb r r r r r r r r r r r r r r r r r r r b b b b b b b b b rb r r r r r r r r r b b b b b b b b b b Th2 (Ω2 ) b b b b b b b b b b Th1 (Ω1 ) Fig. see [AG. Below. however. see [MA4.22. RA3]. we heuristically outline the general stages that would be involved in discretizing (1. and each subproblem may be discretized independently. for a suitable choice of relaxation parameter 0 < θ < 1 and zi (x) > 0.3 Global Discretization Hybrid formulation (1. Endfor (k) (k) Output: (w1 . However. on B i i i i ˜i ˜i 3. by methods suited to the local geometry and regularity of the local solution. Such discretizations have not been studied extensively.3. GA14. PH]. the Robin-Robin iterates will converge geometrically. w2 ) Remark 1.

ABI ABB wB ghi (i) (i) where wI denotes the interior unknowns on Ωhi and wB denotes the bound- ary unknowns on B associated with the discrete solution on Thi (Ωi ). Sepa- rately discretize the two transmission conditions on B:  w1 = w2 . ∀µ ∈ Yh (B). On each subdomain Ωi . employ a traditional method to discretize the following Neumann problem: ⎧ ⎨ Lwi = f. we indicate how each transmission condition can be discretized by a “mortar” element type method. Below. on B. Then the continuity equation w1 = w2 on B may be discretized by a Petrov-Galerkin approximation of its weak form:  (w1 − w2 ) v dsx = 0. on B.6. Examples of such spaces are described in Chap. Xh (B) is typically chosen as a finite element space defined on a triangulation of B inherited from either triangulation Th1 (Ω1 ) or Th2 (Ω2 ). 1. v ∈ Xh (B). For definite- ness suppose Xh (B) = Xh1 (B) is chosen to be of dimension m1 based on the triangulation of B inherited from Th1 (Ω1 ).24 1 Decomposition Frameworks On each subdomain Ωi . on B[i] ⎩ ni · (a∇wi − b wi ) = gi . as in Fig. B . the discretized continuity transmission condition will have the following matrix form: (1) (2) M11 wB = M12 wB . in Ωi wi = 0. generate a grid Thi (Ωi ) of size hi suited to the local geometry and solution. denote the resulting local discretization by:  (i) (i)   (i)    AII AIB wI f hi (i) (i) (i) = . respectively. 11. B where Xh (B) denotes some appropriately chosen subspace of L2 (B). where M11 and M12 are m1 × m1 and m1 × m2 mass matrices. Then. they will be referred to as nonmatching grids. Let ni and mi denote the number of unknowns (i) (i) in wI and wB respectively. In a mor- tar element discretization. on B n1 · (a∇w1 − b w1 ) = n1 · (a∇w2 − b w2 ) . wB ) may be nonmatching on B. If the resulting local grids do not match along B. (i) (i) Since the grid functions (wI . care must be exercised to ensure well posedness and stability of this discretization. where ni denotes the exterior unit normal to ∂Ωi and the flux data gi is to be chosen when the transmission conditions are applied. The flux transmission condition on B may be similarly discretized:  (n1 · (a∇w1 − b w1 ) − n1 · (a∇w2 − b w2 )) µ dsx = 0. Employing block matrix notation.

since Xh (B) = Xh1 (B) is of dimension m1 . it will be preferable that Yh (B) be chosen using the complementary triangulation. Again. we choose Yh (B) = Yh2 (B) of dimension m2 based on triangulation Ωh2 . to ensure that the total number of equations equals the total number of unknowns in the global system. This will yield m2 constraints. However. In the above example. 1. Yh (B) may be chosen as a finite element space defined on the triangulation of B inherited from either triangulation Ωh1 or Ωh2 . which we denote as: .3 Steklov-Poincar´e Framework 25 where it is sufficient to choose Yh (B) ⊂ H01 (B).

.

The actual choice of subspaces Xh1 (B) and Yh2 (B) will be critical to the stability of the resulting global discretization: ⎧ ⎪ ⎪ (1) (1) (1) (1) AII wI + AIB wB = f h1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (1) M11 wB = M12 wB (2) ⎪ ⎪ (2) (2) (2) (2) AII wI + AIB wB = f h2 ⎪ ⎪ ⎪ ⎪ . (1) (1) (1) (1) (1) (2) (2) (2) (2) (2) M21 ABI wI + ABB wB − f B = −M22 ABI wI + ABB wB − f B . where M21 and M22 are m2 × m1 and m2 × m2 matrices. respectively. The (i) interface forcing terms f B have been added to account for the approximation resulting from integration by parts.

.

⎪ ⎩ M21 A(1) w(1) + A(1) w(1) − f (1) = −M22 A(2) w(2) + A(2) w(2) − f (2) . We would then obtain M11 = M12 . then m1 = m2 . If the grids Th1 (Ω1 ) and Th2 (Ω2 ) match on B.23. Similarly.19) are not known to the author. and this scheme was heuristically considered only for its intrinsic interest. yielding: (1) (2) wB = wB . M21 = M22 will be square and nonsingular yielding: . both square and nonsingular. BI I BB B B BI I BB B B General theoretical results on the stability of such discretizations of (1. Remark 1.

.

4 Heterogeneous Approximations A heterogeneous approximation of a partial differential equation is a coupled system of partial differential equations which approximates the given equa- tion. (1) (1) (1) (1) (1) (2) (2) (2) (2) (2) ABI wI + ABB wB − f B = − ABI wI + ABB wB − f B . in which the approximating partial differential equations are not of the same type in different subregions [GA15. In the following. QU6].1). The resulting global discretization will then correspond to the standard finite element discretization of (1. motivated .3. 1.

19) will be employed to heuristically approximate (1.).22) may be obtained by substituting the preceding approximation in the hybrid formulation corresponding to (1. Suppose Ω1 and Ω2 form a non-overlapping decomposition of Ω such that:  |∆u|  |b · ∇u + c u| . Then. LA5].) will yield an ill-posed problem for w1 (. on ∂Ω. QU6]. Formally. . the boundary conditions w1 = 0 on B[1] and w1 = w2 on B can be replaced by inflow boundary conditions w1 = 0 on B[1]. on ∂Ω. where L0 u ≡ b(x) · ∇u + c(x)u. respectively.η L v = f. on Ω 1 . Accordingly.in and w1 = w2 on Bin . resolves this local ill-posedness on Ω1 . yielding: ⎧ ⎪ ⎪ L0 w1 = f. denote the inflow and outflow boundary segments on B and B[1] by: ⎧ ⎪ ⎪ B ≡ {x ∈ B : n1 · b(x) < 0} ⎨ in Bout ≡ {x ∈ B : n1 · b(x) > 0} ⎪ ⎪ ⎩ B[1]. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = 0. Since L0 w1 = f is hyperbolic. on B ⎪ ⎪ L w2 = f.22) u = g. we may approximate L u = f by L0 u = f . However. The Steklov-Poincar´e hybrid formulation (1. on Ω v = 0. on B[1] ⎪ ⎨ w1 = w2 . a subdomain vanishing viscosity approach may be employed as in [GA15]. retaining the Dirichlet boundary conditions on B and B[1] for w1 (. in Ω (1. specification of Dirichlet or Neumann boundary conditions on the entire boundary ∂Ω1 will yield a locally ill posed problem. see [GA15. since L0 w1 is hyperbolic on Ω1 . on B. a global heterogeneous approximation of (1.26 1 Decomposition Frameworks by classical singular perturbation approximations [KE5. Thus.22).in ≡ {x ∈ B[1] : n1 · b(x) < 0}. on subdomain Ω1 . To deduce the remaining transmission boundary conditions in the heterogeneous approximation. on B[2] ⎪ ⎩ n1 · (∇w2 − b w2 ) = n1 · (∇w1 − b w1 ) . where 0 <   1 is a perturbation parameter. Fortunately. the elliptic equation L u = f may be approximated by the discontinuous coefficient elliptic problem:  . we heuristi- cally outline how an elliptic-hyperbolic heterogeneous approximation can be constructed for the following singularly perturbed elliptic equation:   L u ≡ − ∆u + b(x) · ∇u + c(x) u = f. Indeed. in Ω2 ⎪ ⎪ ⎪ ⎪ w2 = 0.22). replacing the Dirichlet conditions by inflow conditions.

12. on Bin −n1 · b w1 = n1 · (∇w2 − b w2 ) . the global system of partial differential equations satisfied by the weak limit of the solutions v . Dirichlet-Neumann iterative methods can be formulated to solve the above heterogeneous approximation to (1. on B[1]. 1. on Bout . η) ≡  for x ∈ Ω2 . GL7]. on Bin ⎪ L w2 = f. For rigorous results on the well posedness of the preceding het- erogeneous system. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = 0. in Ω2 ⎪ ⎪ ⎪ 2 ⎪ n · (∇w 2 − b w2 ) = −n2 · b w1 . on Bin ⎪ ⎩ −n1 · b w1 = n1 · (∇w2 − b w2 ) . 1.4 Lagrange Multiplier Framework The framework we refer to as the Lagrange multiplier formulation [GL. on B. on Bin . letting η → 0+ . heuristically. When b(x) is continuous. ⎪ ⎪ ⎪ ⎪ n2 · ∇w2 = 0. the problem will be elliptic and the traditional trans- mission conditions should hold: w1 = w2 . readers are referred to [GA15]. on Bout . Remark 1. on Bin 0 = n1 · ∇w2 . As a result. η) is defined by: η for x ∈ Ω1 a(x. ⎩ w2 = 0. on B[2] . and imposing the inflow condition on Bin yields: w1 = w2 . the substitution that w1 = w2 on Bin will yield the following additional simplifications: ⎧ ⎪ ⎨ w1 = w2 . on B. on B n1 · (η∇w1 − b w1 ) = n1 · (∇w2 − b w2 ) . It is employed in the FETI (Finite Element Tearing and Interconnection) method . For  > 0 and η > 0. underlies a variety of non-overlapping domain decomposition methods.22).4 Lagrange Multiplier Framework 27 where L. QU6] and Chap.η v ≡ −∇ · (a(x.24. see [GA15. However.η as η → 0 will be: ⎧ ⎪ ⎪ L0 w1 = f. η)∇v) + b(x) · ∇v + c(x) v and a(x.in ⎪ ⎪ ⎪ ⎨ w1 = w 2 .

7.) can be obtained by minimizing this extended energy functional. see (1. we will show that the optimization problem (1. on ∂Ω. The Lagrange multiplier hybrid formulation will be the saddle point problem associated with this constrained minimization problem. It is well known that the solution u minimizes an energy J(. with c(x) ≥ 0. 1. We outline the steps below. we illustrate its application to formulate iterative algorithms.).24) can be reformulated as a constrained optimization problem based on the subdomains. An immersed non-overlapping decomposition .23) u = 0. The Lagrange multiplier hybrid formulation is the saddle point problem associated with this constrained minimization problem. Ω1 Ω2 B Fig. BE22. BE18. Accordingly. Thus. The resulting sum of local energies will be well defined even if the local displacement functions are dis- continuous across the interface B = ∂Ω1 ∩ ∂Ω2 . It is thus an extended energy functional. the solution u must optimize some energy functional J(·). 1. WO4.28 1 Decomposition Frameworks (a constrained optimization based parallel iterative method [FA16.1 Motivation Let Ω1 and Ω2 form a non-overlapping decomposition of the domain Ω of elliptic equation (1. the elliptic equation (1. For such a property to hold.1) must be self adjoint and coercive.4. requiring that b(x) = 0 and c(x) ≥ 0.24) and (1. FA15]). and in non- overlapping Schwarz iterative methods [LI8. the mortar element method (a method for discretizing elliptic equations on nonmatching grids [MA4. Using this decomposition of Ω.23). WO5]). In this section.25) within H01 (Ω). in this section we shall consider:  L u ≡ −∇ · (a(x) ∇u) + c(x) u = f. The Lagrange multiplier framework is applicable only when there is an optimization principle associated with the elliptic equation. non-matching grid discretiza- tions and heterogeneous approximations. see Fig.7. Given any non-overlapping subdomain decomposition of Ω. A constrained minimization problem equivalent to the minimization of J(. in Ω (1. we may decompose the energy functional J(·) associated with (1. 1. BE4. BE6. GL8]. subject to the constraint that the local displacements match on the interface B.23) as a sum of en- ergy contributions Ji (·) from each subdomain Ωi .

CI2. wi ) ≡ Ωi (∇vi · a∇wi + cvi wi ) dx. w2 ). ∀µ ∈ Y } . Xi ≡ v ∈ H (Ωi ) : v = 0 on B[i] . By optimization theory. for vi . 1 Here JE (w1 . we minimize JE (v1 . but subject to the weak constraint that the subdomain functions match on B:  m ((v1 . (1. w2 ) is defined even when w1 = w2 on B. ·). ⎪ ⎩ Fi (wi )  for wi ∈ Xi . w) ≡ Ω (a∇v · ∇w + c vw) dx. v2 ). v2 ) ∈ X1 × X2 : m ((v1 . wi ) − Fi (wi ). ⎪ ⎪ ⎪ ⎩X ≡ H01 (Ω). 10. 1 for wi ∈ Xi  ⎪ Ai (vi . see [ST14. η) ∈ X1 × X2 × Y as: L ((v1 .24) w∈X where ⎧ ⎪ ⎪ J(w) ≡ 12 A(w. w2 ) to the constrained minimization problem (1. the solution (w1 . Problem (1.23): J(u) = min J(w). see [CI4] and Chap.24). w2 ) ≡ J1 (w1 ) + J2 (w2 ). v2 ). wi ∈ Xi ⎪ ⎪  ⎪ ⎪ ≡ Ωi f wi dx. where: ⎧ ⎪ JE (w1 . v2 ) within the larger (extended) class of functions X1 × X2 defined above. v2 ). η) . w ∈ X  (1. We may express the energy J(w) = JE (w1 . µ) of an associated Lagrangian functional L (·. for w ∈ X. v2 ). It is well known. Constrained Minimization Formulation. w2 ) ≡ J1 (w1 ) + J2 (w2 ). µ) = 0.27) . (1.26) (v1 . B −1/2 1/2 where Y ≡ H00 (B) (the dual space of H00 (B)). where µ ∈ Y denotes an artificially introduced variable referred to as a Lagrange multiplier. 1. v2 ). η) ≡ J1 (v1 ) + J2 (v2 ) + m ((v1 . that the solution u to (1. We define the Lagrangian func- tion for ((v1 . w) − F (w). Saddle Point Formulation.4 Lagrange Multiplier Framework 29 Minimization Formulation. ∀µ ∈ Y. ⎪ ⎪  ⎪ ⎨ A(v. To obtain a constrained minimization problem equivalent to (1.25) ⎪ ⎪ F (w) ≡ Ω f wdx.24) will thus be formally equivalent to the following constrained minimization problem: J1 (w1 ) + J2 (w2 ) = min J1 (v1 ) + J2 (v2 ). µ) ≡ (v1 − v2 ) µ dsx = 0. Suppose wi ≡ w on Ω i for 1 ≤ i ≤ 2. (1. JO2. BR28].v2 )∈K where K ≡ {(v1 . for v.23) minimizes the energy J(·) associated with (1. for wi ∈ Xi ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ Ji (wi ) ≡ 2 Ai (wi . Let {Ωi }2i=1 be a non-over- lapping decomposition of Ω.26) can be expressed as components in the saddle point ((w1 .

Then u(x) = w1 (x) in Ω 1 and u(x) = w2 (x) in Ω 2 . The next result indicates the equivalence of (1.29) by parts.19) for the substitution µ = n2 · (a∇u) on B. v2 ). Requiring the first order variation at the saddle point ((w1 .29) m ((w1 . Let (w1 .30).30).23) is equivalent to (1. on B[2] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n2 · (a∇w1) = µ.29) associated with it [GI3].) will be uniquely solvable provided B[i] = ∅. µ) satisfying: ⎧ ⎪ Lw1 = f. η) ≤ L ((w1 . η) = 0. w2 ).) (representing the flux on B) so that w1 = w2 on B. w2 (. The above system is referred to as a saddle point problem. w2 ).30) ⎪ ⎪ ⎪ ⎪ w2 = 0. w2 .). µ) to be zero yields:  2 2 i=1 Ai (wi . on B ⎪ ⎩ w1 = w2 . 1. µ) = i=1 Fi (vi ). we can express it in terms of partial differential equations involving w1 (. v2 ).23). for vi ∈ Xi (1. For each choice of Neumann data µ(·). w2 . in Ω2 (1. Let u be a solution to (1.30). Theorem 1. Suppose the following assumptions hold. we obtain: L ((w1 .19).25.30) to (1. for η ∈ Y. We seek (w1 . It does not demonstrate the well posedness of (1. v2 ) ∈ X1 × X2 and η ∈ Y . µ) ∈ X1 × X2 × Y of L (·). on B[1] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n · (a∇w1 ) = −µ.23) and (1. Proof. 2.30) is equivalent to (1. each subdomain problem for wi (. The preceding result only asserts the equivalence between solu- tions of (1. The latter can be demonstrated for (1.) as follows. vi ) + m ((v1 . w2 ). and since (1. .) and the Lagrange multiplier variable µ(. w2 ). If we integrate the weak form (1. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = 0. on B where B[i] ≡ ∂Ωi ∩ ∂Ω is the exterior boundary and ni is the unit exterior normal to ∂Ωi for i = 1. 2.26. The equivalence follows since (1. Hybrid Formulation.28) for any choice of (v1 . µ) (1.23). on B ⎨ 1 Lw2 = f.30 1 Decomposition Frameworks At the saddle point ((w1 . µ) ≤ L ((v1 . µ) be a solution to the hybrid formulation (1.   Remark 1. w2 ). We must choose the Lagrange multiplier µ(.30) by employing general results on the well posedness of the saddle point problem (1.

to update the Lagrange multiplier function µ(·).1 (Uzawa’s Method) Let µ(0) denote a starting guess with chosen step size τ > 0.30). an iterative method for solving (1. Determine w1 and w2 in parallel: ⎧ . 1. 1. 1.23) can be obtained by applying a saddle point iterative algorithm such as Uzawa’s method. Algorithm 1.) in (1.2 Iterative Methods Since the Lagrange multiplier µ(. · · · until convergence do: (k+1) (k+1) 2.) determines w1 (. see Chap. 10.) and w2 (. as described below.4 Lagrange Multiplier Framework 31 1. For k = 0.4.4.

on B[1] ⎪ ⎪ . in Ω1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (k+1) w1 = 0. ⎪ ⎪ −∇ · a∇w1 (k+1) + c w1 (k+1) = f.

⎪ ⎪ n1 · a∇w1 (k+1) = −µ(k) . ⎪ . ⎨ on B.

in Ω2 ⎪ ⎪ ⎪ ⎪ (k+1) w2 = 0. on B[2] ⎪ ⎪ . ⎪ ⎪ −∇ · (k+1) (k+1) ⎪ ⎪ a∇w 2 + c w2 = f.

Update µ(k+1) as follows: . 3. ⎪ ⎩ (k+1) n2 · a∇w 2 = µ(k) . on B.

for x ∈ B. w2 ) . (k+1) (k+1) µ(k+1) (x) = µ(k) (x) + τ w1 (x) − w2 (x) . Endfor (k) (k) Output: (w1 . 4.

·) by an augmented Lagrangian Lδ (·.28. see [GL7. see Chap. Remark 1. The FETI method [FA16. (k) (k) Remark 1. ·). However. GL8]: δ Lδ ((v1 . and where the local problems may be singular. 10. it generalizes the preceding saddle point iterative algorithm to the multisubdomain case. 2 . where an additional non-negative functional is added to the original Lagrangian functional with a coefficient δ > 0. An alternative hybrid formulation equivalent to (1. 4. µ) + v1 − v2 2L2 (B) . is also based on updating the Lagrange multiplier µ. and thus the iterates will converge geometrically to the true solution for sufficiently small τ > 0. FA15]. v2 ). µ) ≡ J1 (v1 ) + J2 (v2 ) + m ((v1 .30) can be obtained by replacing the Lagrangian functional L (·. Discrete versions of Uzawa’s algorithm are described in Chap.27. The map µ(k) → w1 − w2 will be compact. where the rate of convergence may deteriorate with increasing number of subdomains. v2 ).

For k = 0. Applying an alternating directions implicit (ADI) method to determine the saddle point of the augmented Lagrangian functional. Let δ > 0 be a chosen parameter.4. both formulations will be equivalent.32 1 Decomposition Frameworks The augmented term 2δ v1 − v2 2L2 (B) will be zero when the constraint v1 = v2 is satisfied on B. GL8]. w2 denote starting guesses. Solve in parallel: ⎧ . 1. 1.2 (Non-Overlapping Schwarz Method) (0) (0) Let w1 . Algorithm 1. · · · until convergence do: 2. As a result. and the saddle point of the augmented Lagrangian will also yield the desired solution. will yield the following algorithm. referred to as the non-overlapping Schwarz method [LI8.

⎪ ⎪ −∇ · a∇w1 (k+1) (k+1) + cw1 = f. in Ω1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (k+1) w1 = 0. on B[1] ⎪ ⎪ .

.

on B. ⎪ ⎪ (k+1) (k+1) (k) (k) ⎨ n1 · a∇w1 + δw1 = n1 · a∇w2 + δw2 . ⎪ .

. in Ω2 ⎪ ⎪ ⎪ ⎪ (k+1) w2 = 0. ⎪ ⎪ −∇ · (k+1) (k+1) ⎪ ⎪ a∇w2 + cw2 = f.

on B[2] ⎪ ⎪ .

DO4. to ensure that the resulting discretization yields a constrained minimization problem. w2 ) Remark 1. 1.8. on B. In practice. Construct a finite element space Yh1 (B) ⊂ L2 (B) ⊂ Y consisting of piecewise polynomial functions defined on . BE22.23) can be obtained by discretizing (1. 3. ⎪ ⎩ (k+1) (k+1) (k) (k) n2 · a∇w2 + δw2 = n2 · a∇w1 + δw1 . WO5]. see Fig. Select a triangulation of interface B inherited either from Th1 (Ω1 ) or Th2 (Ω2 ). 11.29). suppose that Th1 (Ω1 ) is chosen. a careful choice of parameter δ > 0 will be necessary for optimal convergence [LI8. see also Chap. a discretization of (1. BE4.30). For definiteness. WO4. Endfor (k) (k) Output: (w1 . GL8]. it is advantageous to employ a Galerkin approximation of the saddle point problem (1. An extensive literature exists on such nonmatching grid discretization techniques. However. The resulting discretization is re- ferred to as a mortar element method.29. see [MA4. Triangulate each subdomain Ωi by a grid Thi (Ωi ) of size hi suited to the local geometry and solution for 1 ≤ i ≤ 2. Each subdomain can be triangulated independently without requiring the local triangulations to match on B. Let Xhi ⊂ Xi denote a traditional finite element space defined on the triangulation Thi (Ωi ).4.3 Global Discretization In principle. 1.

1. for 1 ≤ i ≤ 2 ⎪ ⎩ (1) m ((wh1 . Discretization of the saddle point formulation (1. µh ) = µh M wh1 − M wh2 . M (1) −M (2) 0 µh 0 where: ⎧ ⎨ Ai (whi . for 1 ≤ i ≤ 2 F (whi ) T = whi f hi .4 Lagrange Multiplier Framework 33 Ω1 Ω2 B[1] Fig.8. 1. T (2) Here we have used whi and µh to denote finite element functions and whi and µh as their vector representations with respect to some fixed basis. The dimension of Yh1 should equal the dimension of Xh1 ∩H01 (B). See Chap. whi ) ⎪ = wThi A(i) whi . wh2 ). 11 for multiplier spaces Yh1 (B).29) using the subspaces Xh1 × Xh2 × Yh1 (B) will yield a linear system of the form: ⎡ (1) T ⎤⎡ ⎤ ⎡ ⎤ A 0 M (1) wh1 f h1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢0 A(2) −M (2) ⎥ T ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ wh2 ⎦ = ⎣ f h2 ⎦ . . Non-overlapping nonmatching grids the triangulation of B inherited from Th1 (Ω1 ).

wB cor- responding to the unknowns in the interior of each subdomain and on the interface B. Substituting. ⎢ ⎢ ⎥⎢ (2)T ⎥ ⎢ (2) ⎥ ⎥ ⎢ ⎥ ⎢ (2) ⎥ ⎢0 0 (2)T AIB (2) ABB −MB ⎦ ⎣ wB ⎦ ⎣ f B ⎦ ⎣ 0 MB (1) 0 −M (2) 0 µh 0 B . we obtain: ⎡ (1) (1) ⎤⎡ ⎤ ⎡ (1) ⎤ A AIB 0 0 0 (1) wI fI ⎢ II T ⎥ ⎢ A(1) A(1) (1)T ⎥ ⎢ (1) ⎥ ⎢ (1) ⎥ ⎢ IB 0 0 MB ⎥ ⎢ wB ⎥ ⎢ ⎥ ⎢ fB ⎥ ⎥ ⎢ BB ⎥⎢⎢ (2) ⎥ ⎢ (2) ⎥ ⎢0 0 (2) AII AIB (2) ⎥ 0 ⎥ ⎢ wI ⎥ = ⎢ f I ⎥ . then matrices A(i) and M (i) will have the block structure:  (i) (i)  AII AIB   (i) A = (i) T (i) and M (i) = 0 MB (i) . for 1 ≤ i ≤ 2 AIB ABB (i) (i) where wI and wB are of size ni and mi . T (i) (i) If each nodal vector whi is block partitioned as whi = wI .

. we may parameterize the solution space of the interface constraints as: (1) (2) (1)−1 (2) wB ≡ R12 wB where R12 ≡ MB MB . for x ∈ Ω 1 . on ∂Ω. (1) (1) (2) The local unknowns can then be represented as wI . in Ω (1. Below.31). we heuristically apply the subdomain vanishing viscosity method as in [GA15]:  −∇ · (a. in Ω (1.31) u = g(x). (2) (2) wI .23) is singularly perturbed. Suppose Ω1 and Ω2 form a nonoverlapping decomposition of Ω. where 0 <   1 is a small perturbation parameter and c(x) ≥ c0 > 0.η (x)∇u) + c(x) u = f (x). then matrix MB will be square and invertible of size m1 . 11. wB = R12 wB . wB ) and applying first order stationarity con- ditions for its minimum yields the following linear system: ⎡ (1) (1) ⎤ ⎡ (1) ⎤ ⎡ (1) ⎤ AII 0 AIB R12 wI fI ⎢ ⎥ ⎢ (2) ⎥ ⎢ ⎥ ⎢0 AII (2) (2) AIB ⎥⎢w ⎥ = ⎢ (2) fI ⎥. WO5]. In this case. The resulting global discretization will be stable and convergent of optimal order.30) can be employed to heuristically study an heterogeneous approximation of it. Substituting this representation into the discrete energy (1) (2) (2) (2) Jh1 (wI . 1.4. a basis for Yhi (B) can (i) be constructed so that matrix MB is diagonal [WO4.32) u = g(x). we illustrate two alternative approximations of the following singularly perturbed. and wB .23). its Lagrange multiplier formulation (1. self adjoint elliptic equation [KE5]:  −∇ · (∇u) + c(x) u = f (x). Ω2 must enclose the boundary layer region of the solution.4 Heterogeneous Approximations When elliptic equation (1. BE22. then R12 = I and the above discretization reduces to the traditional conforming finite element discretization of (1. R12 wB ) + Jh2 (wI . on ∂Ω. To obtain an heterogeneous approximation of (1. BE6. BE18. ⎣ ⎦⎣ I ⎦ ⎣ ⎦ T (1)T (2)T T (1) (2) (2) T (1) (2) R12 AIB AIB R12 ABB R12 + ABB wB R12 fB + fB If both grids match. such that: |∆u|  |c(x) u| . They include piecewise polynomial functions which are continuous across elements as well as piecewise polynomial functions which are discontinuous across elements [MA4. Mortar element spaces Yhi (B) are described in Chap.34 1 Decomposition Frameworks (1) If the dimension of the space Yh1 (B) is m1 . Then. In the latter case. BE4].

in Ω2 ⎪ ⎪ ⎪ ⎪ w2 = w1 . on B[2] . on B ⎪ ⎪ ⎩ w2 = g(x). on B ⎨ − ∆w2 + c(x) w2 = f (x).) formally satisfies a zeroth order equation in Ω1 .4 Lagrange Multiplier Framework 35 where η for x ∈ Ω1 a. yielding the alternative system: ⎧ ⎪ ⎪ c(x) w1 = f (x). then the local solution may be ill posed. then continuity of the local solutions must be enforced and the flux transmission condition needs to be omitted. Either the transmission condition w1 = w2 or  ∂w 2 ∂n = 0 can be enforced. in Ω2 ⎪ ⎪ ⎪ ⎪  ∂w 2 ⎪ ⎪ ∂n = 0. on B ⎩ w2 = g(x). the above problem is elliptic and coercive. For  > 0 and η > 0. If a discontinuous approximation is sought. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = g(x). indicating a poor choice of subdomain Ω1 . on B ⎪ ⎪ ⎩ w1 = w2 . Since c(x) ≥ c0 > 0. Two alternative approximations may be constructed. on B[1] ⎨ − ∆w2 + c(x) w2 = f (x). on B[1] ⎨ − ∆w2 + c(x) w2 = f (x). If a continuous (or H 1 (·)) solution is sought. formally the limiting system (1. in Ω2 ⎪ ⎪ ⎪ ⎪ w2 = g(x). in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = g(x). 1. then the continuity transmission condition can be omitted.e. and the flux transmission condition can be enforced. since w1 (. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = g(x). . on B[2] .η (x) ≡  for x ∈ Ω2 .30) becomes: ⎧ ⎪ ⎪ c(x) w1 = f (x). but not both. as η → 0+ . However. on B[1] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 = µ. the limiting equation on Ω1 for w1 (x) can be solved to formally yield: f (x) w1 (x) = .. on B. i. c(x) If B[1] = ∅ and the boundary data g(x) is not compatible with the formal (x) solution fc(x) (x) . if g(x) = fc(x) on B[1] . on Ω1 . on B[2] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∂w  ∂n2 = µ. yielding the following system: ⎧ ⎪ ⎪ c(x) w1 = f (x).

and is based on the minimization of a square norm objective functional. In both cases. and heterogeneous approximations for (1. 1. the limiting solutions may not minimize the energy functional J. In this section. an optimization principle need not be as- sociated with the underlying partial differential equation. GU3.30) is equivalent to (1. hence the name least squares-control. which has various applications to partial differential equations. The subdo- mains can be overlapping or non-overlapping. in which the domain Ω is decomposed into two subdomains. We illustrate the formulation of iterative methods. the square norm functional typically measures the difference between the subdomain solutions on the re- gions of overlap or intersection between the subdomains. . the subproblems for w1 and w2 are formally decoupled.19). subject to constraints. in ∂Ω. Similar heuristics may be applied to construct an approximation of the singularly perturbed anisotropic elliptic equation using (1. we describe the hybrid formulation associated with the least squares-control method for the following elliptic equation: Lu ≡ −∇ · (a(x) ∇u) + b(x) · ∇u + c(x) u = f (x).33). while the constraints require the local solutions to solve the original partial differential equation on each subdomain.36 1 Decomposition Frameworks In this case. The control boundary data must be determined to minimize the square norm function. see [AT. it is regarded as a control function which parameterizes the local solution. GL] is a general optimization method.32) as η → 0+ .31. GL13. Remark 1. In domain decompo- sition applications.η (·) associated with (1.33) u = 0. in Ω u = g(x). Since (1. but we focus on the overlapping case. both transmission conditions can be retained in the limiting problem. non-matching grid discretizations. with appropriate boundary conditions.30.5 Least Squares-Control Framework The least squares-control method [LI2. GU2].30): − ux1 x1 − ux2 x2 − ux3 x3 + c(x) u = f (x). for which the limiting problem is a degenerate elliptic equation. Since the boundary data on each subdomain boundary is unknown. Remark 1. rigorous results on the well posedness of the above approximation may be deduced from [GA15]. in Ω (1. Importantly. It results in a constrained least squares problem. In this case. on ∂Ω.

v2 ). for the true subdomain solutions. on B[i] ⎪ ⎩ ni · (a∇wi ) = gi . An overlapping decomposition 1. w1 = w2 . If wi (. since w1 and w2 will match on Ω12 .34) 2 Ω12∗ 2 Ω12 ∗ Typically (γ1 = 1. on Ω12 . On each subdomain Ωi∗ for 1 ≤ i ≤ 2. w2 ) = min J(v1 . ∗ ∗ Furthermore.5 Least Squares-Control Framework 37 B[2] Ω1∗ ∗ Ω12 Ω2∗ B[1] Fig. Let B = ∂Ωi ∩ Ω and B[i] = ∂Ωi ∩ ∂Ω denote the interior (i) and exterior segments. and let ni denote the unit exterior normal to ∂Ωi∗ . within a class K: J(w1 . The preceding observation suggests the following constrained minimization problem equivalent to (1. but other choices are possible.) on Ωi∗ and gi (. (1. on B (i) .5.) = ni · (a(x)∇u) on B (i) .1 Motivation Let Ω1∗ and Ω2∗ form an overlapping decomposition of Ω. γ2 = 0). (1. 1. Then.9.. w2 ) which minimizes J (·) (with minimum value zero). ∗ ∗ see Fig. respectively.e. on B[i] .v2 )∈K where K is defined by the constraints: ⎧ ⎫ ⎨ Lvi = f. it will hold that w1 − w2 L2 (Ω ∗ ) = 0 and |w1 − w2 |H 1 (Ω ∗ ) = 0. we let wi denote the approximation of the solution u to (1. with Ω12 ∗ = Ω1∗ ∩ Ω2∗ . (1. w2 ) = 0.36) ⎩ ⎭ vi = 0.33) on Ωi∗ . it will hold: J(w1 . i. then wi will satisfy: ⎧ ⎪ ⎨ Lwi = f. and let gi denote the local Neumann data associated with wi on B (i) . Motivated by this. v2 ) : ni · (a∇wi ) = gi . Determine (w1 . of the subdomain boundaries. in Ωi∗ ⎬ K ≡ (v1 . in Ωi∗ wi = 0. 1.) = u(. 2 2 12 12 define the following square norm functional J (·):   γ1 γ2 2 J (v1 .9.35) (v1 . on B (i) for 1 ≤ i ≤ 2 . 1.33). v2 ) ≡ (v1 − v2 ) dx + 2 |∇(v1 − v2 )| dx.

u2 ) = 0.. coercivity of (1. Thus. Such a result. w2 ) = min J(v1 . Then this minimum value must be zero. Hopefully. we may alternatively pose Robin or Dirichlet conditions.) and that J(w1 . we cannot pose Dirichlet conditions on B (i) . in Ω12 . Let (w1 . on Ω2∗ .32. . Then at the minimum: J(w1 .35). To avoid cumbersome notation. Thus. in the non-overlapping case. using the definition of J(. gi ). Let the solution u of (1. In a strict sense. Suppose the following assumptions hold.33) and (1. yields that J(w1 .) typically measures the difference between the Dirichlet data.33). u2 ) ∈ K and J(u1 .) ≥ 0. (v1 .   Remark 1. suppose a solution to (1. w2 ) = 0. ∗ The desired result follows using w1 = w2 on Ω12 .35) subject to the constraints (1. v2 ). will hold under appropriate assumptions (such as b = 0. w2 ) = 0 and minimizes J(. Suppose u is the solution to (1.33) and wi ≡ u on Ωi∗ for 1 ≤ i ≤ 2.33)) given sufficient overlap between the subdomains.v2 )∈K it will hold that: w1 = u. Then.33) exist and be smooth. 1.35) under perturbation of data. However.) and K.36). w2 ) minimize (1.. . Let χ1 (x) and χ2 (x) form a partition of unity subordinate to the cover Ω1 and Ω2∗ . on Ω1∗ w2 = u.33) it follows that: u(x) ≡ χ1 (x) w1 (x) + χ2 (x) w2 (x). since for ui ≡ u in Ωi∗ for 1 ≤ i ≤ 2 it will hold that (u1 .36) and minimizes J(v1 . Conversely. however. we obtain that ∗ w1 = w2 on Ω12 . since Lwi = f in Ωi∗ and since w1 = w2 in Ω12 ∗ ..33. by the uniqueness of solutions to (1.36). Furthermore: ∗ w1 − w2 = u − u = 0. we must replace vi by (vi . Proof.38 1 Decomposition Frameworks Instead of Neumann conditions on B (i) . since the functional J(. The following equivalence will hold. . The it is easily verified that χ1 (x) w1 (x) + χ2 (x) w2 (x) ∗ solves (1. such omission should be clear from the context. The preceding result only demonstrates an equivalence between the solutions of (1. we often omit explicit inclusion of gi as an argument in the definition of J(. . v2 ). (w1 . It does not guarantee the well posedness of (1. 2. Theorem 1.35) exists. subject to constraints (1. w2 ) will satisfy all the required constraints (1.

g2∗ ) = min H(g1 . where n · (a∇vi ) = gi . on B (i) ⎩ i vi = 0.. g2 ). for x ∈ B (2) . or such equations may be derived by heuristic analogy with the associated discrete saddle point problem. 1.38) (g1 . an augmented Lagrangian formulation [GL7] may be employed to regularize (1. the function space Xi for the boundary data for gi is typically chosen for each 1 ≤ i ≤ 2 as Xi = (H00 (B (i) )) or Xi = H −1/2 (B (i) ). Neumann or Robin data gi specified on each boundary segment B (i) . Thus. Such unconstrained minimization does not require Lagrange multipliers.) as (w1 . the desired local solutions will satisfy wi ≡ Ei gi∗ for 1 ≤ i ≤ 2.35).33) is self adjoint and coercive.36) can be parameterized in terms of the Dirichlet. AT]. 1 ≤ i ≤ 2} .35) as an unconstrained minimization problem. (1. define an affine linear mapping Ei as follows: ⎧ ⎨ L vi = f. . the constraint set K in (1.5 Least Squares-Control Framework 39 Remark 1. We shall omit the derivation of these equations. E2 g2 ) : for gi ∈ Xi . the unconstrained minimum (g1∗ . where the term J(v1 . as described in Chap. on B[i] . For Neumann conditions. when Neumann boundary conditions are imposed on each B (i) .38). ·): H(g1∗ .35) will depend on the definition of J (·).37) Then. w2 ) = (E1 g1∗ . E2 g2 ). when the elliptic equa- tion (1. g2 ) = 0 ⇔ v2 (x) = 0. Then. Well posedness of the constrained minimization problem (1. the constraint set K can be represented as: K ≡ {(E1 g1 . for 1 ≤ i ≤ 2. 6. v2 ) = 12 v1 − v2 2H 1 (Ω ∗ ) can be 12 shown to yield a well posed saddle point problem [GL.g2 ) will yield the constrained minimum of J(. E2 g2∗ ). once g1∗ and g2∗ have been determined by minimizing H(·. except to note that the calculus of variations may be applied to (1. Define a function H(·): H(g1 . (1. For instance. for x ∈ B (1) δH (g1 . This parameterization 1/2 enables the reformulation of this constrained minimization problem (1. The unknown control data g1 and g2 can be determined by solving the system of equations which result from the application of first order stationarity conditions δH = 0 at the minimum of H(·). More generally. For instance. J(v1 . As mentioned earlier. The resulting first order stationarity equations will be of the form: v1 (x) = 0. ·). g2 ) ≡ J(E1 g1 . where g1 and g2 are regarded as control data. in Ωi∗ Ei gi ≡ vi .34. g2∗ ) of H(·. v2 ) is coercive in the constraint space K.

on B. If Ω is decomposed into non-overlapping subdomains Ω1 and Ω2 with common interface B = ∂Ω1 ∩ ∂Ω2 . The above constraints will ensure that the original elliptic equation is solved on . Solve: ⎧ ∗ ⎨ −∇ · (a ∇wi ) + b · ∇wi + c wi = f (x). Later. 2. a preconditioned CG method can be employed to solve the resulting linear system. 2 and K consists of all (v1 . Here µ(x) is a flux variable on the interface B (which can be eliminated). in Ω1 ⎪ ⎪ ⎪ ⎪ v1 = 0. (v1 . on B[1] ⎪ ⎪ ⎪ ⎨ n1 · (a∇v1 ) = µ(x). Next. v2 ) satisfying the following constraints: ⎧ ⎪ ⎪ Lv1 = f (x). v1 (x) and v2 are defined as the solutions to: ⎧ ∗ ⎨ −∇ · (a ∇vi ) − ∇ · (b vi ) + c vi = r(x). on B[2] ⎪ ⎩ n2 · (a∇v2 ) = −µ(x). see Chap. 6. When (1. v2 ). on B (i) for w1 (x) and w2 (x) using g1 (x) and g2 (x). w2 ) = min J(v1 .v2 )∈K where 1 J(v1 . 2 ⎪ ⎩ ni · (a∇wi ) = gi (x). on B ⎪ ⎪ Lv2 = f (x). a least squares-control formulation may be constructed as follows [GU3. Seek (w1 . on B[i] for i = 1.40 1 Decomposition Frameworks where v1 (x) and v2 (x) are defined in terms of g1 (x) and g2 (x) as follows. compute: ∗ w1 (x) − w2 (x).35.35) is discretized. on B (i) The control data g1 (x) and g2 (x) must be chosen to ensure that vi (x) = 0 on B (i) for i = 1. in Ωi ⎪ wi = 0. w2 ) which minimizes: J(w1 . Remark 1. ⎪ ⎩ ni · (a∇vi + b vi ) = 0. an explicit matrix representation can be derived for H(·) and its gradient. for x ∈ Ω12 . in Ωi ⎪ vi = 0. v2 ) ≡ v1 − v2 2L2 (B) . Then. GU2]. In this case. on B[i] for 1 ≤ i ≤ 2. in Ω2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ v2 = 0. we shall outline a gradient method to determine g1 and g2 iteratively. for x ∈ Ω12 r(x) ≡ ∗ 0.

on B (i) . ⎪ in Ωi∗ vi = 0. 6. 3.33) can be determined iteratively. 1.38).5. for x ∈ Ω12 5. on B[i] ⎪ ⎩ (k) ni · (a∇vi ) = gi (x).2 Iterative Methods The solution to (1. 2 in parallel solve: ⎧ ⎨ −∇ · (a ∇vi ) + b · ∇vi + c vi = f (x). 1. and that the Neumann fluxes of the two subdomain solutions match on B. the feasible set K can be parameterized in terms of the flux µ(x) = n1 · (a∇v1 ) on B. H00 (B) 1/2 where H00 (B) denotes a fractional Sobolev norm (defined in Chap.1 (Gradient Least Squares-Control Algorithm) (0) (0) Let g1 (x) and g2 (x) denote starting guesses and τ > 0 a fixed step size. with sufficiently small step size τ > 0. Endfor (k) (k) Output: (g1 . For k = 0. Endfor 7. 8. for x ∈ B (1) (k+1) (k) g2 (x) = g2 (x) + τ w2 (x). In this case. in Ωi ⎪ wi = 0. 1. Algorithm 1. In applications. 2 in parallel solve the adjoint problems: ⎧ ∗ ⎨ −∇ · (a ∇wi ) − ∇ · (b wi ) + c wi = r(x). Update: (k+1) (k) g1 (x) = g1 (x) − τ w1 (x). Such an algorithm can be derived formally using calculus of variations. v2 ) ≡ 12 v1 − v2 2 1/2 may also be employed. For i = 1. Endfor 4. on B (i) . an alternative choice of objective functional J(v1 . Compute: ∗ v1 (x) − v2 (x). for x ∈ Ω12 r(x) ≡ ∗ 0. · · · until convergence do: 2. by formally applying a steepest descent method to the unconstrained minimization problem (1. 1. 6.5. or by analogy with the discrete version of this algorithm described in Chap. For i = 1. g2 ) . 3).5 Least Squares-Control Framework 41 each subdomain. for x ∈ B (2) . on B[i] ⎪ ⎩ ni · (a∇wi + b wi ) = 0.

42 1 Decomposition Frameworks

Alternative divide and conquer iterative algorithms can be formulated
for (1.33) using its saddle point formulation. However, the resulting algorithm
may require more computational resources. For instance, suppose that:
1
J(v1 , v2 ) = v1 − v2 2L2 (Ω12
∗ ),
2
and that Neumann boundary conditions are imposed on B (i) . Then, as de-
scribed in Chap. 10, a constrained minimization problem such as (1.35)
with (1.36), can be equivalently formulated as a saddle point problem, and
saddle point iterative algorithms can be formulated to solve it.
Indeed, if λ1 and λ2 denote the Lagrange multipliers, then the saddle point
problem associated with (1.35) would formally be of the form:


⎪ χΩ12 (w1 − w2 ) + L∗1 λ1 = 0,


⎨ −χΩ12 (w1 − w2 ) + L∗2 λ2 = 0,
(1.39)

⎪ L1 w ˜1 = f1 ,


⎩ L2 w ˜2 = f2 .

Here Li w˜i = fi formally denotes the operator equation associated with
L wi = f in Ωi∗ with Neumann conditions ni · (a ∇wi ) − gi = 0 on B (i) and ho-
mogeneous Dirichlet boundary conditions wi = 0 on B[i] , with w ˜i = (wi , gi ).
The operator L∗i formally denotes the adjoint of Li . Here, χΩ12 ∗ (x) denotes

the characteristic (indicator) function of Ω12 . We omit elaborating on such a
saddle point problem here, except to note that, it may be obtained by heuris-
tic analogy with the discrete saddle point problems described in Chap. 10.
The λi (x) corresponds to Lagrange multiplier functions, see [GL, AT]. In this
saddle point problem, the Lagrange multiplier variables will not be unique,
and an augmented Lagrangian formulation would be preferable.

1.5.3 Global Discretization
Hybrid formulation (1.35) or (1.38) can, in principle, be employed to dis-
cretize (1.33) on a nonmatching grid such as in Fig. 1.10. Such discretizations
have not been considered in the literature, however, a heuristic discussion of
such a discretization is outlined here for its intrinsic interest, employing for-
mulation (1.38). We employ finite element discretizations on the subdomains.
A nonmatching grid discretization of (1.38) will require discretizing J(·):
1
J(v1 , v2 ) = v1 − v2 2H 1 (Ω12
∗ ),
2
and this will involve two overlapping non-matching grids. In the following, we

heuristically outline a mortar element discretization of J(v1 , v2 ) on Ω12 , and
employ this to construct a global non-matching grid discretization of (1.33),
with Dirichlet boundary controls on each subdomain boundary B (i) . Each
subdomain problem will involve only a conforming grid.

1.5 Least Squares-Control Framework 43

Th2 (Ω2∗ )
r r r r r r r r r r
r r r r r r r r r r
r r r r r r r r r r
r r r r r r r r r r
b b b b b rb r br rb r rbr br r r
r r r r r r r r r r
b b b b b rb r br rb r rbr br r r
r r r r r r r r r r
b b b b b rb r br rb r rbr br r r
b b b b b rb r br rb r rbr br r r
r r r r r r r r r r
b b b b b rb r br rb r rbr br r r
b b b b b b b b b b
b b b b b b b b b b
Th1 (Ω1∗ )
Fig. 1.10. Overlapping nonmatching grids

Remark 1.36. If J(v1 , v2 ) is replaced by JB (v1 , v2 ) ≡ 12 v1 − v2 2B where B =
∂Ω1 ∩ ∂Ω2 and Ωi∗ is an extension of a non-overlapping decomposition Ωi ,
such a discretization would be considerably simpler.

Local Triangulation. For 1 ≤ i ≤ 2 triangulate each subdomain Ωi∗ by a
grid Thi (Ωi∗ ) according to the local geometry and regularity of the solution,
see Fig. 1.10. We shall assume that at least one of the local grids triangulates

the region of overlap Ω12 . For definiteness assume that triangulation Th1 (Ω1∗ )
triangulates Ω12 . Let ni and mi denote the number of nodes of grid Thi (Ωi∗ )

in the interior of Ωi∗ and on B (i) , respectively. Additionally, let li denote the

number of nodes of triangulation Thi (Ωi∗ ) in Ω 12 .
Local Discretizations. For 1 ≤ i ≤ 2, employ Dirichlet boundary conditions
on B (i) in (1.36) and discretize the resulting local problems using a finite
element space Xhi ⊂ Xi based on triangulation Thi (Ωi∗ ):
 
Xi ≡ vi ∈ H 1 (Ωi∗ ) : vi = 0 on B[i] .
(i) (i)
Block partition the unknowns whi = (wI , wB )T according to the interior
unknowns and the unknowns on the boundary B (i) respectively. Denote the
block partitioned linear system for the discretized Dirichlet problem as:

(i) (i) (i) (i) (i)
AII wI + AIB wB = f I ,
(i) (i)
wB = gB .

Weak Matching on Ω12 . Choose a finite element space:
∗ ∗
Yh (Ω12 ) ⊂ L2 (Ω12 )

44 1 Decomposition Frameworks


based on the triangulation of Ω12 inherited from Th1 (Ω1∗ ), of dimension l1 .

Define the weak matching condition on Ω12 as:


(wh1 − wh2 ) µh1 dx = 0, for µh1 ∈ Yh1 (Ω12 ),

Ω12


enforced using the subspace Yh1 (Ω12 ). Denote its matrix form as:
M11 wh1 − M12 wh2 = 0,
−1
where M11 is invertible of size l1 . Define an oblique projection P1 ≡ M11 M12 .
Discrete Functional J(·, ·). Let A(12) be the stiffness matrix associated

with J(·) on the triangulation Th1 (Ω12 ). The quadratic functional J(·) can be
(12)
discretized using A and the projection P1 as follows:

⎪ J (vh1 , vh2 ) ≡ 12 vh1 − vh2 2H 1 (Ω ∗ )
⎨ 12
T
⎪ ≈ 2 (vh1 − P1 vh2 ) R12
1 T
A(12) R12 (vh1 − P1 vh2 )

≡ Jh (vh1 , vh2 ) .

Here R12 is a restriction map onto the nodes of Ω 12 from Ω1∗ , see Chap. 6.
The reduced functional Hh (·) can be discretized using:
Hh (gh1 , gh2 ) ≡ Jh (vh1 , vh2 ) ,
where  
(i)−1 (i) (i) (i)
AII (f I − AIB gB )
v hi = (i)
for 1 ≤ i ≤ 2.
gB

Stationarity Condition. The first order derivative conditions for the mini-
(1) (2)
mum of Hh (·) will yield the following equations for (gB , gB ):
   (1)   (1) 
E1T R12
T
A(12) R12 E1 − E1T R12 T
A(12) R12 P1 E2 gB γB
=
− E2 P1 R12 A R12 E1
T T T (12) T T T (12)
E2 P1 R12 A R12 P1 E2 gB
(2) (2)
γB
(1.40)
where
⎧ (1)

(1) (2)

⎪ γ B ≡ E1T R12 T
A(12) R12 −µI + P1 µI ,




⎪ γ
(2)
≡ E T T T
P R A (12)
R −µ
(1)
+ P µ
(2)
,

⎪ B 2 1 12 12 I 1 I

⎪  

⎪ −1
⎨ (i) (i)
Ei ≡ −AII AIB ,
⎪ I

⎪  

⎪ −1

⎪ (i) A
(i)
f
(i)

⎪ µI ≡ II I ,

⎪ 0


⎩ w(i) = A(i)−1 f (i) − A(i) g(i) ,
I II I IB B for i = 1, 2.

1.5 Least Squares-Control Framework 45

Thus, a non-matching grid discretization of (1.33) based on the subdomains
(1) (2)
involves solving system (1.40) for the control boundary data gB and gB .
(i)
Subsequently, the subdomain solution wI can be determined as:

(i) (i)−1 (i) (i) (i)
wI = AII f I − AIB gI , for 1 ≤ i ≤ 2.

Remark 1.37. General results on the stability and convergence properties of
such discretizations are not known. However, when both local grids match on

Ω12 , projection P1 = I and the global discretization will be equivalent to a
traditional discretization of (1.33) on the global triangulation.

1.5.4 Heterogeneous Approximations

The least square-control formulation (1.35) provides a flexible framework for
constructing heterogeneous approximations of general systems of partial dif-
ferential equations of heterogeneous character [AT, GL13]. We illustrate here
how an elliptic-hyperbolic approximation can be constructed for the following
singularly perturbed elliptic equation:
 
L u ≡ − ∆u + b(x) · ∇u + c(x) u = f, on Ω
(1.41)
u = 0, on ∂Ω,

where 0 <   1 is a perturbation parameter. Suppose Ω1∗ and Ω2∗ form an
overlapping covering of Ω such that:

| ∆u|  |b(x) · ∇u + c(x) u| , in Ω1∗ .

We may then heuristically approximate L u = f in Ω1∗ by L0 u = f where
L0 u ≡ b(x) · ∇u + c(x) u. To construct an elliptic-hyperbolic approximation
of (1.41), replace the elliptic problem L v1 = f on Ω1∗ by the hyperbolic prob-
lem L0 v1 = f within the least squares-control formulation (1.35) of (1.41).
The resulting heterogeneous problem will seek (w1 , w2 ) which minimizes:
ˆ 1 , w2 ) =
J(w min ˆ 1 , v2 ),
J(v
(v1 ,v2 )∈K
ˆ

where
1
Jˆ (v1 , v2 ) ≡ v1 − v2 2L2 (Ω12
∗ ),
2
ˆ consists of (v1 , v2 ) which satisfy the constraints:
and K
⎧ ∗
⎧  ∗

⎨ L0 v1 = f, on Ω1 ⎪
⎨ L v2 = f, on Ω2
(1)
v1 = g1 , on Bin and v2 = g2 , on B (2) (1.42)

⎩ ⎪

v1 = 0, on B[1],in v2 = 0, on B[2] .

46 1 Decomposition Frameworks

Here the inflow boundary segments of B (1) and B[1] are defined by:
 
(1)
Bin ≡ x ∈ B (1) : n1 (x) · b(x) < 0
 
B[1],in ≡ x ∈ B[1] : n1 (x) · b(x) < 0 ,

where n1 (x) is the unit outward normal to B1 at x.

Remark 1.38. The admissible set Kˆ may be parameterized in terms of the local
boundary data. An equivalent unconstrained minimization problem may then
be obtained analogous to (1.37) and (1.38). See also Chap. 12.

Remark 1.39. The solution (w1 , w2 ) to the above heterogeneous model may
∗ ˆ within the class Kˆ may no
not match on Ω12 and the minimum value of J(·)
longer be zero. A continuous global solution, however, may be obtained by
employing a partition of unity χ1 (x) and χ2 (x) subordinate to the cover Ω1∗
and Ω2∗ and by defining:

w(x) ≡ χ1 (x) w1 (x) + χ2 (x) w2 (x).

Remark 1.40. Rigorous results are not known on the well posedness of the
above heterogeneous model. The above procedure has been generalized and
employed to construct heterogeneous approximations to the Boltzmann,
Navier-Stokes and Euler equations [AT, GL13].

2
Schwarz Iterative Algorithms

In this chapter, we describe the family of Schwarz iterative algorithms. It
consists of the classical Schwarz alternating method [SC5] and several of its
parallel extensions, such as the additive, hybrid and restricted Schwarz meth-
ods. Schwarz methods are based on an overlapping decomposition of the do-
main, and we describe its formulation to iteratively solve a discretization of a
self adjoint and coercive elliptic equation. In contrast with iterative algorithms
formulated on non-overlapping subdomains, as in Chap. 3, the computational
cost per Schwarz iteration can exceed analogous costs per iteration on non-
overlapping subdomains, by a factor proportional to the overlap between the
subdomains. However, Schwarz algorithms are relatively simpler to formulate
and to implement, and when there is sufficient overlap between the subdo-
mains, these algorithms can be rapidly convergent for a few subdomains, or
as the size of the subdomains decreases, provided a coarse space residual cor-
rection term is employed [DR11, KU6, XU3, MA15, CA19, CA17].
Our focus in this chapter will be on describing the matrix version of
Schwarz algorithms for iteratively solving the linear system Au = f obtained
by the discretization of an elliptic equation. The matrix versions correspond
to generalizations of traditional block Gauss-Seidel and block Jacobi iterative
methods. Chap. 2.1 presents background and matrix notation, restriction and
extension matrices. Chap. 2.2 describes the continuous version of the classi-
cal Schwarz alternating method [MO2, BA2, LI6] and derives its projection
version, which involves projection operators onto subspaces associated with
the subdomains. The projection version of the Schwarz alternating method
suggests various parallel generalizations such as the additive Schwarz, hybrid
Schwarz and restricted Schwarz methods. Chap. 2.3 describes the matrix ver-
sion of Schwarz algorithms, which we refer to as Schwarz subspace algorithms
[XU3]. Chap. 2.4 discusses implementational issues for applications to finite
element or finite difference discretizations of elliptic equations. Specific choices
of coarse spaces are also described. Chap. 2.5 describes theoretical results on
the convergence of Schwarz algorithms in an energy norm.

48 2 Schwarz Iterative Algorithms

2.1 Background
In this section, we introduce notation on the elliptic equation and its weak
formulation and discretization, subdomain decompositions and block matrix
partitioning of the resulting linear system, restriction and extension maps.

2.1.1 Elliptic Equation
We consider the following self adjoint and coercive elliptic equation:

⎨ Lu ≡ −∇ · (a(x)∇u) + c(x) u = f, in Ω

u = gD , on BD (2.1)


n · (a∇u) + γ u = gN , on BN ,
on a domain Ω ⊂ IRd for d = 2, 3, with unit exterior normal n(x) at x ∈ ∂Ω,
Dirichlet boundary BD ⊂ ∂Ω, and natural (Neumann or Robin) boundary
BN ⊂ ∂Ω where B D ∪ BN = ∂Ω and BD ∩ BN = ∅. We shall assume that the
diffusion coefficient a(x) is piecewise smooth and for 0 < a0 ≤ a1 satisfies:
a0 |ξ|2 ≤ ξ T a(x) ξ, ≤ a1 |ξ|2 , ∀x ∈ Ω, ξ ∈ IRd .
To ensure the coercivity of (2.1), we shall assume that c(x) ≥ 0 and γ(x) ≥ 0.
In most applications, we shall assume BD = ∂Ω and BN = ∅.
Remark 2.1. When BD = ∅, γ(x) ≡ 0 and c(x) ≡ 0, functions f (x) and gN (x)
will be required to satisfy compatibility conditions for solvability of (2.1):
 
f (x)dx + gN (x)dsx = 0.
Ω ∂Ω

In this case, the general solution u(·) to the Neumann boundary value problem
will not be unique, and will satisfy u(x) ≡ u∗ (x) + α where u∗ (x) is any
particular non-homogeneous solution and α is a constant.

2.1.2 Weak Formulation
The weak formulation of (2.1) is obtained by multiplying it by a test function
v(.) with zero boundary value on BD , and integrating the resulting expression
by parts over Ω. The weak problem will seek u ∈ HD 1
(Ω) which satisfies
u(.) = gD (.) on BD such that:
A(u, v) = F (v), ∀v ∈ HD
1
(Ω), (2.2)
where A(·, ·), F (·) and 1
HD (Ω) are defined by:
⎧  

⎪ A(u, v) ≡ (∇u · a∇v + c u v) dx + BN γ u v dsx ,
⎨ 


F (v) ≡ f v dx + BN gN v dsx , (2.3)

⎪ 


⎩ 1
HD (Ω) ≡ v ∈ H 1 (Ω) : v = 0 on BD .
1
Here HD (Ω) denotes the space satisfying zero Dirichlet boundary conditions.

2.1 Background 49

2.1.3 Finite Element Discretization

Let Th (Ω) denote a quasiuniform triangulation of Ω ⊂ IRd with elements of
size h. For simplicity, we assume that the elements are simplices (triangles
when d = 2 or tetrahedra when d = 3) and that Vh ⊂ H 1 (Ω) is the space of
continuous piecewise linear finite element functions on Th (Ω). Homogeneous
essential boundary conditions can be imposed in Vh by choosing Vh ∩ HD1
(Ω).
The finite element discretization of (2.1), see [ST14, CI2, JO2, BR28, BR],
will seek uh ∈ Vh with uh = Ih gD on BD and satisfying:

A(uh , vh ) = F (vh ), ∀vh ∈ Vh ∩ HD
1
(Ω). (2.4)

Here Ih denotes the nodal interpolation onto Vh , restricted to BD . This will
yield a linear system Ah uh = f h . We shall often omit the subscript h.
Let nI , nBN and nBD denote the number of nodes of triangulation Th (Ω)
in the interior of Ω, the boundary segments BN and BD , respectively. Denote
by xi for 1 ≤ i ≤ (nI + nBN + nBD ) all the nodes of Th (Ω). We assume that
these nodes are so ordered that:

⎨ xi ∈ Ω, for 1 ≤ i ≤ nI

xi ∈ BN , for (nI + 1) ≤ i ≤ (nI + nBN )


xi ∈ BD , for (nI + nBN + 1) ≤ i ≤ (nI + nBN + nBD ).

Corresponding to each node 1 ≤ i ≤ (nI + nBN + nBD ), let φi (x) denote the
continuous piecewise linear finite element nodal basis in Vh , satisfying:

φi (xj ) = δij , for 1 ≤ i, j ≤ (nI + nBN + nBD ),

where δij denotes the Kronecker delta. Given uh (x) ∈ Vh , we expand it as:
nI nB
uh (x) = i=1 (uI )i φi (x) + i=1N (uBN )i φnI +i (x)
nBD
+ i=1 (uBD )i φnI +nBN +i (x),

where uI , uBN and uBD denote subvectors defined by:


⎪ (uI )i ≡ uh (xi ), 1 ≤ i ≤ nI ,

(uBN )i ≡ uh (xnI +i ), 1 ≤ i ≤ n BN ,


⎩ (u ) ≡ u (x ), 1 ≤ i ≤ n .
BD i h nI +nBN +i BD

This block partitions the vector of nodal values associated with uh as:

T
uh = uTI , uTBN , uTBD ,

corresponding to the ordering of nodes in Ω, BN and BD , respectively.

50 2 Schwarz Iterative Algorithms

Employing the above block partition, the finite element discretization (2.4)
of (2.1) is easily seen to have the following block structure:


⎨ AII uI + AIBN uBN + AIBD uBD = f I
ATIBN uI + ABN BN uBN + ABN BD uBD = f BN


uBD = Ih gD ,

where the block submatrices and vectors above are defined by:


⎪ (AII )ij = A(φi , φj ), 1 ≤ i, j ≤ nI



⎪ (AIBN )ij = A(φi , φnI +j ), 1 ≤ i ≤ nI , 1 ≤ j ≤ n BN



⎪ (AIBD )ij = A(φi , φnI +nBN +j ), 1 ≤ i ≤ nI , 1 ≤ j ≤ n BD



⎨ (A )
BN BN ij = A(φ nI +i , φnI +j ), 1 ≤ i, j ≤ n BN

⎪ (A )
BN BD ij = A(φ , φ ), 1 ≤ i ≤ n BN , 1 ≤ j ≤ n BD


nI +i nI +nBN +j

⎪ 1 ≤ i ≤ nI
⎪ (f I )i

= F (φi ),


⎪ (f BN )i
⎪ = F (φnI +i ), 1 ≤ i ≤ n BN


(Ih gD )i = gD (xnI +nBN +i ), 1 ≤ i ≤ n BD .

Eliminating uBD in the above linear system yields:

AII uI + AIBN uBN = f I − AIBD Ih gD
(2.5)
ATIBN uI + ABN BN uBN = f BN − ABN BD Ih gD .

In matrix notation, this yields the block partitioned linear system:
    
AII AIBN uI ˜f I
= ,
ATIBN ABN BN uBN ˜f B
N

where    
˜f I f I − AIBD Ih gD
≡ .
˜f B
N
f BN − ABN BD Ih gD

Remark 2.2. If BN = ∅, then problem (2.1) will be a Dirichlet problem with
∂Ω = BD . In this case, the discretization reduces to:

Ah uh = f h , (2.6)

with Ah ≡ AII and f h ≡ f I − AIB Ih gB , where we have denoted B ≡ BD .
Remark 2.3. If BD = ∅, then (2.1) will be a Robin problem if γ(x) = 0, or a
Neumann problem if γ(x) ≡ 0. In this case ∂Ω = BN and we shall use the
notation B ≡ BN . The discretization of (2.1) will then have the form:
     
AII AIB uI ˜f I
Ah uh = f h , with Ah ≡ , u h ≡ , f h ≡ . (2.7)
ATIB ABB uB ˜f B

2.1 Background 51

If γ(x) ≡ 0 and c(x) ≡ 0, then matrix Ah will be singular, satisfying Ah 1 = 0,
where 1 and 0 denote vectors of appropriate size having all entries identically
1 or 0, respectively. In this case, the forcing f h in (2.7) will be required to
satisfy the compatability condition 1T f h = 0, for the linear system to be
solvable. The solution space will then have the form uh = u∗h + α 1 for α ∈ IR,
where u∗h is any particular solution.

2.1.4 Multisubdomain Decompositions

We employ the following notation for multidomain decompositions, see Fig. 2.1.

Definition 2.4. A collection of open subregions Ωi ⊂ Ω for 1 ≤ i ≤ p will be
referred to as a nonoverlapping decomposition of Ω if the following hold:
 p
∪l=1 Ω i = Ω,
Ωi ∩ Ωj = ∅, if = j.

Boundaries of the subdomains will be denoted Bi ≡ ∂Ωi and their interior
and exterior segments by B (i) ≡ ∂Ωi ∩ Ω and B[i] ≡ ∂Ωi ∩ ∂Ω, respectively.
We denote common interfaces by Bij ≡ Bi ∩ Bj and B ≡ ∪i B (i) .

When the subdomains Ωi are shape regular, we let h0 denote its diameter.
For additional notation on non-overlapping subdomains, see Chap. 3.

Definition 2.5. A collection of open subregions Ωi∗ ⊂ Ω for 1 ≤ i ≤ p will be
referred to as an overlapping decomposition of Ω if the following holds:

∪pl=1 Ωi∗ = Ω.
p
If {Ωl }l=1 forms a non-overlapping decomposition of Ω of diameter h0 and
each Ωi ⊂ Ωi∗ , then {Ωl∗ }l=1 will be said to form an overlapping decomposition
p
p
of Ω obtained by extension of {Ωl }l=1 . Most commonly:

Ωi∗ ≡ Ωiβ h0 ≡ {x ∈ Ω : dist(x, Ωi ) < β h0 } (2.8)

where 0 < β < 1 is called the overlap factor. Boundaries will be denoted ∂Ωi∗
and with abuse of notation, B (i) ≡ ∂Ωi∗ ∩Ω and B[i] ≡ ∂Ωi∗ ∩∂Ω, respectively.

Non-overlapping subdomains Selected extended subdomains
Ω1 Ω2 Ω3 Ω4 Ω1∗
Ω5 Ω6 Ω7 Ω8

Ω9 Ω10 Ω11 Ω12 Ω11
Ω13 Ω14 Ω15 Ω16

Fig. 2.1. Multidomain overlapping and non-overlapping decompositions

52 2 Schwarz Iterative Algorithms

2.1.5 Restriction and Extension Maps
Restriction and extension maps are rectangular matrices used for representing
domain decomposition preconditioners. A restriction map will restrict a vector
of nodal values to a subvector corresponding to indices in some index set S.
An extension map will extend a subvector of nodal values in S to a full vector,
whose entries will be zero outside S. Formally, given any subregion S ⊂ (Ω ∪
BN ), order the nodes of Th (Ω) in S in some local ordering. Let n ≡ (nI + nBN )
denote the total number of finite element unknowns, and nS the number of
nodes of Th (Ω) in S. We shall associate an index function index(S, i) to denote
the global index of the i’th local node in S for 1 ≤ i ≤ nS . We then define an
nS × n restriction matrix RS which will map a vector in IRn of nodal values
on the grid Th (Ω) into a subvector in IRnS of nodal values associated with the
nodes in S in the local ordering:

1 if index(S, i) = j
(RS )ij = (2.9)
0 if index(S, i) = j.

The transpose RST of restriction matrix RS is referred to as an extension
matrix. It will be an n × nS matrix which extends a vector in IRnS to a vector
in IRn with zero entries corresponding to indices not in S.
Remark 2.6. Given a vector v ∈ IRn of nodal values in Th (Ω), the vector
RS v ∈ IRnS will denote its subvector corresponding to indices of nodes in S
(using the local ordering of nodes in S). Given a nodal vector vS ∈ IRnS of
nodal values in S, the vector RST vS ∈ IRn will denote a nodal vector in Th (Ω)
which extends vS to have zero nodal values at all nodes not in S. To imple-
ment such maps, their action on vectors should be computed algorithmically
employing suitable data structures and scatter-gather operations.
Remark 2.7. Given the global stiffness matrix Ah of size n, its submatrix ASS
of size nS corresponding to the nodes in S may be expressed formally as:
ASS = RS Ah RST .
In implementations, the action of ASS on vectors should be computed algo-
rithmically employing scatter-gather operations and sparse data structures.
Remark 2.8. Typical choices of S in Schwarz algorithms will be indices of
nodes in Ωi∗ ∪ (BN ∩ ∂Ωi∗ ). (In Schur complement algorithms, see Chap. 3,
the set S will correspond to indices of nodes on segments, called globs, of the
subdomain boundaries B (i) . The notation RS and RTS will be used).

2.1.6 Partition of Unity
Given an overlapping decomposition Ω1∗ , . . . , Ωp∗ of Ω, we shall often employ a
smooth partition of unity χ1 (x), . . . , χp (x) subordinate to these subdomains.
The partition of unity functions must satisfy the following requirements:

2.1 Background 53
⎧ ∗
⎨ χi (x) ≥ 0, in Ω i

χi (x) = 0, in Ω \Ω i (2.10)

χ1 (x) + · · · + χp (x) = 1, in Ω.
As in Chap. 1.1, a continuous partition of unity may be constructed based on
the distance functions di (x) ≡ dist(x, ∂Ωi∗ ∩ Ω) ≥ 0 as follows:
di (x)
χi (x) ≡ , for 1 ≤ i ≤ p.
d1 (x) + · · · + dp (x)
Smoother χi (x) may be obtained by using mollified di (x), see [ST9].

2.1.7 Coarse Spaces
The convergence rate of one-level domain decomposition algorithms (namely,
algorithms involving only subdomains problems) will typically deteriorate as
the number p of subdomains increases. This may be understood heuristically
as follows. Consider a rectangular domain Ω divided into p vertical strips. Each
iteration, say of a Schwarz alternating method, will only transfer information
between adjacent subdomains. Thus, if the forcing term is nonzero only in
the first strip and the starting iterate is zero, then it will take p iterations for
the local solution to be nonzero in the p’th subdomain. For elliptic equations
(which have a global domain of dependence on the solution, due to the Green’s
function representation), the solution will typically be nonzero globally even
when the forcing term is nonzero only in a small subregion. Thus, an algo-
rithm such as the classical Schwarz alternating method (and other one-level
methods) will impose limits on the speed at which information is transferred
globally across the entire domain.
The preceding limitation in the rate of convergence of one-level domain de-
composition iterative algorithms can be handled if a mechanism is included for
the global transfer of information across the subdomains. Motivated by multi-
grid methodology [BR22, HA2, MC2] and its generalizations [DR11, XU3],
such a global transfer of information can be incorporated by solving a subprob-
lem on an appropriately chosen subspace of the finite element space, whose
support covers the entire domain. Such subspaces are referred to as coarse
spaces, provided they satisfy specified assumptions. A simple example would
be the space of coarse grid finite element functions defined on a coarse trian-
gulation Th0 (Ω) of Ω, as in two-level multigrid methods. In the following, we
list the approximation property desired in such coarse spaces, where 0 < h0
represents a small parameter (typically denoting the subdomain size).
Definition 2.9. A subspace V0 ⊂ Vh ∩ HD 1
(Ω) will be referred to as a coarse
space having approximation of order O(h0 ) if the following hold:

Q0 uh H 1 (Ω) ≤ C uh H 1 (Ω) , ∀uh ∈ Vh ∩ HD 1
(Ω)
uh − Q0 uh L2 (Ω) ≤ C h0 uh H 1 (Ω) , ∀uh ∈ Vh ∩ HD
1
(Ω)
where Q0 denotes the L2 -orthogonal projection onto subspace V0 ∩ HD
1
(Ω).

54 2 Schwarz Iterative Algorithms

Using a coarse space V0 ⊂ Vh , information may be transferred globally
across many subdomains, by solving a finite dimensional global problem, using
residual correction as follows. Suppose wh denotes an approximate solution of
discrete problem (2.4) in Vh ∩ HD 1
(Ω). An improved approximation wh + w0
of uh may be sought by selecting w0 ∈ V0 so that it satisfies the following
residual equation:

A(w0 , v) = F (v) − A(wh , v), ∀v ∈ V0 . (2.11)

It is easily verified that w0 is the A(., .)-orthogonal projection of uh − wh onto
the subspace V0 . Once w0 is determined, wh + w0 will provide an improved
approximation of the desired solution uh .
The preceding coarse space residual problem (2.11) can be represented in
matrix terms as follows. Let n0 denote the dimension of V0 ⊂ Vh ∩ HD 1
(Ω)
(0) (0)
and let ψ1 (·), · · · , ψn0 (·) denote a basis for V0 . If n = (nI + nBN ) is the
dimension of Vh ∩ HD 1
(Ω), let x1 , · · · , xn denote the nodes in (Ω ∪ BN ). Define
an n × n0 matrix R0 whose entries are defined as follows:
T

⎡ (0) (0) ⎤
ψ1 (x1 ) · · · ψn0 (x1 )
⎢ ⎥
⎢ .. .. ⎥
R0T = ⎢ . . ⎥.
⎣ ⎦
(0) (0)
ψ1 (xn ) · · · ψn0 (xn )

Let w0 = R0T α and v = R0T β denote nodal vectors representing w0 and v
above, for suitable coefficient vectors α, β ∈ IRn0 . Then (2.11) becomes:

β T (R0 Ah R0T )α = β T R0 (f h − Ah wh ) , ∀β ∈ IRn0 .

This yields the linear system A0 α = R0 (f h − Ah wh ) , where A0 = (R0 Ah R0T ).
The vector update to the approximate solution wh will then be wh + R0T α,
which may also be expressed as wh + R0T A−1 0 R0 (f h − Ah wh ). Four specific
coarse spaces V0 are described in the following. Additional spaces are described
in [BR15, SM2, CO8, SA7, WI6, MA17].
Coarse Triangulation Space. If domain Ω can be triangulated by a quasi-
uniform triangulation Th0 (Ω) with elements of size h0 > h, such that Th (Ω)
is obtained by successive refinement of Th0 (Ω), then a coarse space V0 can be
defined as the space of continuous, piecewise linear finite element functions on
triangulation Th0 (Ω). To enforce homogeneous essential boundary conditions
so that V0 ⊂ Vh ∩ HD1
(Ω), the Dirichlet boundary segment BD must the union
of boundary segments of elements of Th0 (Ω). Such coarse spaces are motivated
by multigrid methodology.
Interpolation of a Coarse Triangulation Space. If the geometry of Ω
is complex or the triangulation Th (Ω) is unstructured, then it may be com-
putationally difficult, if not impossible, to construct a coarse triangulation

2.1 Background 55

Th0 (Ω) of Ω from which to obtain Th (Ω) by successive refinement. In such
cases, an alternative coarse space [CA4, CH17] can be constructed as follows,
when BN = ∅. Let Ω ∗ ⊃ Ω denote an extension of Ω having simpler geometry
(such as a polygon). Let Th0 (Ω ∗ ) denote a coarse triangulation of Ω ∗ hav-
ing elements of size h0 > h. The elements of Th0 (Ω ∗ ) will in general not be
the union of elements in Th (Ω). Despite this, a coarse subspace of Vh can be
defined as follows. Let Vh0 (Ω ∗ ) ⊂ H01 (Ω ∗ ) denote a finite element space on
triangulation Th0 (Ω ∗ ) of Ω ∗ with zero boundary values. Define V0 as:

V0 ≡ {πh wh∗ 0 : wh∗ 0 ∈ Vh0 (Ω ∗ )},

where πh denotes the standard nodal interpolation onto all grid points of
Th (Ω) excluding nodes on BD . By construction V0 ⊂ Vh ∩ HD
1
(Ω).
Interpolation of a Polynomial Space. If as in the preceding case, the
geometry of Ω is complex or the triangulation Th (Ω) is unstructured, and
BD = ∅, then a coarse space may be defined as follows. Let Pd (Ω) denote
the space of all polynomials of degree d or less on Ω. Generally Pd (Ω) ⊂ Vh .
However, we may interpolate such polynomials onto the finite element space
V h ∩ HD
1
(Ω) as follows:

V0 ≡ {πh wd (x) : wd (x) ∈ Pd (Ω)} ,

where πh denotes the standard nodal interpolant onto the finite element space
V h ∩ HD
1
(Ω). By construction V0 ⊂ Vh ∩ HD1
(Ω).
Piecewise Constant Space. A more general coarse space, referred to as the
piecewise constant coarse space [CO8, SA7, MA17, WA6], can be constructed
given any nonoverlapping decomposition Ω1 , . . . , Ωp of Ω as follows. Let h0
denote the size of the subdomains and define Ωi∗ as the extension of Ωi con-
taining all points of Ω within a distance β h0 to Ωi . Let χ1 (.), . . . , χp (.) denote
a partition of unity based on Ω1∗ , . . . , Ωp∗ . This partition of unity should be
constructed so that its sum is zero on BD and unity on BN . Denote the union
of subdomain interfaces as B ≡ (∪pi=1 ∂Ωi ) \ BD .
Define a restriction map RB which restricts any function w(x) onto B:

RB w(x) ≡ w(x), for x ∈ B.

Given a function v(x) defined on B, denotes its piecewise harmonic extension
Ev(x) into the interior of each subdomain Ωi for 1 ≤ i ≤ p as:

L (Ev) = 0, in Ωi
Ev = v, on ∂Ωi ,

where L Ev denotes the elliptic operator applied to Ev. The continuous version
of the piecewise constant coarse space V0 is now defined as:

V0 ≡ span [E RB χ1 . . . E RB χp ] .

56 2 Schwarz Iterative Algorithms

A finite element version of V0 can be constructed analogously, see Chap. 2.5,
using restriction onto nodal values on B and discrete harmonic extensions into
the subdomains. If the coefficient a(.) in (2.1) is discontinuous of the form:

a(x) ≡ ai for x ∈ Ωi , 1 ≤ i ≤ p,

then it will be advantageous to rescale the original partition of unity to account
for large variation in a(.). A new partition of unity χˆ1 (.), . . . , χ
ˆp (.) will be:

ai χi (x)
ˆi (x) ≡
χ for 1 ≤ i ≤ p.
a1 χ1 (x) + · · · + ap χp (x)

An alternative coarse space Vˆ0 can be constructed based on this.

2.2 Projection Formulation of Schwarz Algorithms
In this section, we describe the classical Schwarz alternating method for iter-
atively solving the following coercive elliptic equation:

⎨ Lu ≡ −∇ · (a(x)∇) + c(x) u = f, in Ω

n · ( a∇u) + γ u = gN , in BN (2.12)


u = 0, on BD ,

where c(x) ≥ 0, γ(x) ≥ 0, and BD and BN denote Dirichlet and natural
boundary segments of ∂Ω. The weak formulation of (2.12) seeks u ∈ HD
1
(Ω):

A(u, v) = F (v), ∀v ∈ HD
1
(Ω), (2.13)

where
⎧ 

⎪ A(u, v) ≡ Ω
(a(x)∇v · ∇v + c(x) u v) dx


⎨ + BN γ(x) u v ds(x), for u, v ∈ HD1
(Ω)
  (2.14)

⎪ F (v) ≡ Ω f (x) v(x) dx + BN gN (x) v ds(x), for v ∈ HD1
(Ω)

⎪  

HD1
(Ω) ≡ v ∈ H 1 (Ω) : v = 0 on BD .

Applying integration by parts to the continuous version of the multidomain
Schwarz alternating method, we shall derive a formal expression for the up-
dates in the iterates as involving orthogonal projections onto certain subspaces
1
of HD (Ω). Employing these projections, we shall derive various parallel ex-
tensions of the classical Schwarz alternating method, including the additive
Schwarz, hybrid Schwarz and restricted Schwarz methods. Let Ω1∗ , · · · , Ωp∗
denote an overlapping decomposition of Ω, and let B (i) ≡ ∂Ωi∗ ∩ Ω and
B[i] ≡ ∂Ωi∗ ∩ ∂Ω denote the interior and exterior boundary segments of Ωi∗ .

2.2 Projection Formulation of Schwarz Algorithms 57

2.2.1 Classical Schwarz Alternating Method

Let w(0) denote a starting iterate satisfying w(0) = 0 on BD . Then, the
multidomain Schwarz alternating method will iteratively seek the solution
to (2.12) by sequentially updating the iterate on each subdomain Ωi∗ in some
prescribed order. Each iteration (or sweep) will consist of p fractional steps
and we shall denote the iterate in the i’th fractional step of the k’th sweep as
i i−1 i
w(k+ p ) . Given w(k+ p ) the next iterate w(k+ p ) is computed as follows:

⎪ (k+ pi ) i

⎪ −∇ · a(x)∇w + c(x) w(k+ p ) = f (x), in Ωi∗

⎨ i i
n · a∇w(k+ p ) + γ w(k+ p ) = gN , on B[i] ∩ BN
(2.15)

⎪ (k+ i
) (k+ i−1
) (i)

⎪ w p = w p , on B

⎩ i
w(k+ p ) = 0, on B[i] ∩ BD .
i−1
The local solution w(k+ p ) is then extended outside Ωi∗ as follows:
i i−1 ∗
w(k+ p ) ≡ w(k+ p ) , on Ω \ Ωi . (2.16)

The resulting iterates will thus be continuous on Ω by construction.
Algorithm 2.2.1 (Continuous Schwarz Alternating Method)
Input: w(0) starting iterate.
1. For k = 0, 1, · · · until convergence do:
2. For i = 1, · · · , p solve:

⎪ (k+ pi ) i
⎪ −∇ · a(x)∇v
⎪ + c(x) v (k+ p ) = f (x), in Ωi∗

15. Endfor The iterates w(k) (. and the coefficients in (2.5 and Chap. the diameters diam(Ωi∗ ) of the subdomains. . on B (i) ⎪ ⎪ ⎪ ⎩ i v (k+ p ) = 0. on Ω \ Ω i . The convergence factor 0 < δ < 1 will generally depend on the overlap β between the subdomains. ⎨ i i n · a∇v (k+ p ) + γ v (k+ p ) = gN . 3. on B[i] ∩ BD .1).) with: u − w(k) H 1 (Ω) ≤ δ k u − w(0) H 1 (Ω) . Update: i ∗ (k+ pi ) v (k+ p ) .) will converge geometrically to the solution u(. see Chap. Endfor 4. on B[i] ∩ BN ⎪ ⎪ i i−1 v (k+ p ) = w(k+ p ) . 2. on Ω i w ≡ i−1 ∗ w(k+ p ) .

then since information is transferred only between adja- cent subdomains during each sweep of the Schwarz iteration. d do: 3. while if w(0) = 0 and f (. · · · . However. since such subdomain does not intersect. · · · . p} is said to yield a d-coloring of the subdomains if: i.) 1. The following is the multicolor Schwarz algorithm with starting iterate w(. can often be remedied by using coarse space residual correction (described later). 2.10.) has support in only one subdomain. · · · . a partition C1 . For l = 1. Ωp∗ . The Schwarz alternating Alg. 2.). Given subdomains Ω1∗ . This is because the true solution to (2.58 2 Schwarz Iterative Algorithms As the number p of subdomains increases.2 (Multicolor Schwarz Alternating Algorithm) Input: w(. however. j ∈ Ck with i = j =⇒ Ωi∗ ∩ Ωj∗ = ∅.). For each i ∈ Cl solve in parallel: ⎧ .2. Cd of the index set {1. For k = 0. Such a deteri- oration in the convergence. It is sequential in nature. Then.1 is also known as the multiplicative or sequential Schwarz algorithm. · · · . Definition 2. the convergence rate typically deteriorates yielding δ → 1. all subproblems on subdomains of the same color can be solved concurrently.12) has a global domain of dependence on f (. · · · until convergence do: 2. it may generally take p sweeps before this information is transferred globally. Algorithm 2.2. so that subdomains of the same color Ck do not intersect. paral- lelizability of this algorithm can be significantly improved by grouping the subdomains into colors so that distinct subdomains of the same color do not intersect.

in Ωi∗ ⎪ ⎪ . ⎪ i i ⎪ ⎪ −∇ · a(x)∇v (k+ p ) + c(x) v (k+ p ) = f (x).

Endfor 5. ⎨ i i n · a∇v (k+ p ) + γv (k+ p ) = gN . ⎪ ⎩ i v (k+ p ) = 0. on Ω i .) . on B[i] ∩ BD . Update: i ∗ w ← v (k+ p ) . Endfor 6. w(k+1) ← w 7. Endfor Output: w(. on B[i] ∩ BN ⎪ ⎪ i v (k+ p ) on B (i) ⎪ ⎪ = w. 4.

C2 = {2. ∗ 1 see [MA37. If q processors are available and the subdomains can be colored into d colors with approximately (p/d) subdomains of the same color.. and further if (p/d) is a multiple of q. 2. 4. 2. We define an A(. 10. then Pi u can be computed without explicit knowledge of u using that A(u. JO2]. v).17) We will employ the property that the bilinear form A(.2.2 may be grouped into four colors: C1 = {1. Additionally. . . C4 = {6. 13.) in (2.12. 3. i The updates w(k+ p ) in the continuous Schwarz alternating method can be 1 expressed in terms of certain projection operators onto subspaces of HD (Ω). to ensure that the loads assigned to each processor are balanced. Remark 2. LI6]. The existence and uniqueness of Pi w is guaranteed by the Lax-Milgram lemma. · · · . 11} . and each subdomain should be approximately of the same diameter.13. 9. Some communication will be necessary between the different subdomains. C3 = {5. . Definition 2. 15} . Remark 2. On each Ωi define a subspace Vi of HD (Ω) as:   Vi ≡ v ∈ HD 1 (Ω) : v = 0 in Ω \ Ωi∗ . Multisubdomain overlapping decomposition Remark 2.14) defines an inner product on HD 1 (Ω) when BD = ∅.13). Given w ∈ HD 1 (Ω) define Pi w ∈ Vi as the solution of: A(Pi w. the number of colors d should be chosen to be as small as possible. then subdomains of the same color may be partitioned into q groups and each group assigned to one of the processors.11. Ω16 ∗ in Fig. there should be approx- imately the same number of subdomains of each color. 12} .2 Projection Formulation of Schwarz Algorithms 59 Non-overlapping decomposition Selected extended subdomains Ω1 Ω2 Ω3 Ω4 Ω1∗ Ω5 Ω6 Ω7 Ω8 ∗ Ω9 Ω10 Ω11 Ω12 Ω11 Ω13 Ω14 Ω15 Ω16 Fig. 2. 7.)- 1 orthogonal projection operator Pi onto subspace Vi of HD (Ω) as follows. v) = F (v). 14. If u denotes the solution of weak formula- tion (2. provided the overlap β is not too large. For instance. To minimize the number d of sequential steps. 8. see [CI2].. since F (·) is given for all v ∈ Vi . The following result shows that the projection maps Pi can represent the updates in the continuous version of the Schwarz alternating method.14. (2. for v ∈ Vi . the subdomains Ω1∗ . 16} . see [CI2. v) = A(w.

on B[i] ∩ BN (2. 1. on Ωi∗ ⎪ ⎪ ⎨ n · (a∇wi ) + γ wi = gN . ∀v ∈ Vi . on B (i) ⎪ ⎩ wi = 0. v) = F (v) = A(u. An (k+ pi ) (k+ i−1 application of Lemma 2. Ωi∗ Ω where wi ∈ Vi due to its boundary conditions. Then wi = w + Pi (u − w) . Suppose the following assumptions hold. Multiplying (2. Let u satisfy (2. Thus. Given w ∈ HD 1 (Ω) let wi satisfy: ⎧ ⎪ −∇ · (a(x)∇wi ) + c(x) wi = f (x). v) = A(u − w. Since (wi − w) = 0 in Ω \ Ωi∗ it yields wi − w ∈ Vi and wi − w = Pi (u − w). 2.18) by v ∈ Vi ⊂ HD 1 (Ω) (which is zero outside Ωi∗ ).   The continuous version of the Schwarz alternating method may now be reformulated in terms of the projection operators Pi onto Vi ⊂ HD 1 (Ω). with wi ≡ w on Ω \ Ωi∗ .18) ⎪ ⎪ wi = w. Proof.15.13) and let gN (x) ≡ n · (a(x)∇u) + γ(x) u on BN . v). on B[i] ∩ BD . we obtain wi = w + Pi (u − w) . Employing the above yields: A(wi − w.60 2 Schwarz Iterative Algorithms Lemma 2. and integrating the resulting term by parts yields:   (Lwi ) v dx = (Lwi ) v dx = A(wi . ∀v ∈ Vi . v).15 with wi ≡ w and w ≡ w p ) yields: i i−1 .

Algorithm 2.2. 1. · · · until convergence do: 2. i−1 w(k+ p ) = w(k+ p ) + Pi u − w(k+ p ) . For k = 0.3 (Projection Version of the Classical Schwarz Method) Input: w(0) starting iterate. p do i i−1 . (2. 1. · · · .19) Substituting this representation into the Schwarz alternating method yields its projection formulation. For i = 1.

Endfor Output: w(k) . 3. Endfor 4. i−1 w(k+ p ) = w(k+ p ) + Pi u − w(k+ p ) .

2.16. the subspaces Vi of HD (Ω) must satisfy: 1 HD (Ω) = V1 + · · · + Vp see Chap.2 Projection Formulation of Schwarz Algorithms 61 Remark 2.5. the projections Pi may no longer involve the solution of partial differential equations on subdomains. For general subspaces Vi ⊂ HD 1 (Ω). however. The preceding projection version of the Schwarz alternating method will also be applicable for more general subspaces Vi ⊂ HD 1 (Ω). Subtracting the iterates in (2.19) from u and recursively applying the expression yields the following equation for the error u − w(k+1) ⎧ . To 1 ensure convergence. 2.

p−1 ⎪ ⎪ u − w(k+1) = (I − Pp ) u − w(k+ p ) ⎪ ⎪ .

13) will also solve: P u = w∗ .13) using a sum of projections P ≡ P1 + · · · + Pp . where each Pi is the A(.13) is a highly parallel algorithm in the Schwarz family [DR11]. ⎪ ⎩ = (I − Pp ) · · · (I − P1 ) u − w(k) . . since the terms Pi u ∈ Vi can be computed by solving: A(Pi u. ∀v ∈ Vi . It is shown in Chap. Define the error amplification map by T = (I − Pp ) · · · (I − P1 ). . (2. see Chap. the solution u of (2. ⎪ ⎪ . upper and lower bounds can be calculated for the spectra of P . we may compute w∗ ≡ (I − T )u without explicit knowledge of u. It reformulates (2.)-orthogonal projection onto Vi defined by (2. Furthermore. . v) = F (v).).20) Equation (2.21). an equivalent problem for determining u is: (I − T )u = w∗ . 2... This map T will be a contraction (in an appropriate norm.20) will be well posed since T is a contraction.2 Additive Schwarz Method The additive Schwarz method to solve (2. For instance. ensuring the well posedness of problem (2. Formally.5).17).5 that the operator P is self adjoint and coercive in the Sobolev space HD 1 (Ω) equipped with the inner product A(. 2. Since (I − T ) involves only sums (or differences) of products of projections Pi . 2. Consequently. (2.2. when p = 2 we obtain that (I − T ) = P1 + P2 − P2 P1 and w∗ = P1 u + P2 u − P2 P1 u. ⎪ ⎨ p−2 = (I − P )(I − P ) u − w(k+ p ) p p−1 ⎪ ⎪ . v) = A(u.21) where w∗ ≡ P1 u + · · · + Pp u can be computed without explicit knowledge of u..

13) is based on the solution of (2.62 2 Schwarz Iterative Algorithms The additive Schwarz formulation of (2.21). for illustrative purposes we indicate a Richardson iteration to solve (2. a new iterate w(k+1) is constructed as follows [TA5].21). For 1 ≤ i ≤ p solve in parallel: ⎧ . Given an iterate w(k) . however. it is typically employed as a preconditioner. In the discrete case.

⎪ ⎪ −∇ · a(x)∇v (k+1) (k+1) + c(x) vi = f (x). in Ωi∗ ⎪ ⎪ .

on B[i] ∩ BD ≡ w(k) on Ω \ Ωi∗ . on B (i) ⎪ ⎪ vi ⎪ ⎩ (k+1) vi = 0. Then update: (k+1) and extend vi . on B[i] ∩ BN ⎪ ⎪ (k+1) = w(k) . i ⎪ ⎨ (k+1) (k+1) n · a∇vi + γ vi = gN .

where 0 < t1 < τ < t2 < p1 is the step size parameter in Richardson’s iteration. Algorithm 2. (k+1) w(k+1) ≡ (1 − τ p) w(k) + τ v1 + · · · + vp(k) .4 (Additive Schwarz-Richardson Iteration) Input: w(0) (starting iterate) and 0 < t1 < τ < t2 < p1 1. Compute in parallel: . For k = 0. The resulting algorithm is summarized below in terms of projections. · · · until convergence do: 2.2.

.2. If a coarse space V0 ⊂ HD 1 (Ω) is employed. Additionally. Endfor The additive Schwarz-Richardson iterates w(k) will converge geometrically to u for appropriately chosen τ .. As in the additive Schwarz method. then P = (P0 + · · · + Pp ) must be employed. w(k+1) ≡ w(k) + τ P1 (u − w(k) ) + · · · + Pp (u − w(k) ) .17). The resulting method yields improved convergence over the additive Schwarz method.)-orthogonal projections Pi for 1 ≤ i ≤ p.3. subspaces Vi are defined by (2. 2. the multiplicative Schwarz iterates will generally converge more rapidly [XU3].3 Hybrid Schwarz Method The hybrid Schwarz method is a variant of the additive Schwarz method obtained by incorporating sequential steps from the multiplicative Schwarz method [MA15]. a . but the algorithm is less parallelizable due to the extra sequential steps. However. with associated A(.2. The matrix version of the additive Schwarz preconditioner is described in Chap. 3.

22) where f∗ ≡ Pˆ u can be computed explicitly.. without explicit knowledge of u. .13) as: u = P0 u + (I − P0 )u.17..2 Projection Formulation of Schwarz Algorithms 63 coarse space V0 ⊂ HD 1 (Ω) is employed with A(. Remark 2. ∀vi ∈ Vi . where V0⊥ denotes the orthogonal complement of V0 in the inner product A(. which is an A(. v0 ) = F (v0 ).. the hybrid Schwarz method solves (2. 2. Determine u0 ∈ V0 satisfying: A(u0 . ∀v0 ∈ V0 . Define w ≡ w1 + · · · + wp and determine u ˜0 ∈ V0 satisfying: u0 .).13): Pˆ u = f∗ . and formally construct the following problem equivalent to (2. Define: Pˆ ≡ P0 + (I − P0 ) (P1 + · · · + Pp ) (I − P0 ).)-orthogonal projection P0 . . The preceding observations may be combined.) and will generally have improved spectral properties over the additive Schwarz operator P = (P1 + · · · + Pp )..22). ∀v0 ∈ V0 . v0 ) = A(w.22) can be computed explicitly as follows. v0 ) = F (v0 ). . .22). . The hybrid Schwarz formulation decomposes the solution to (2. in principle. Formally. The operator Pˆ can be shown to be self adjoint and coercive in A(. vi ). A(˜ ∀v0 ∈ V0 . The component P0 u ∈ V0 can be formally determined by solving the subproblem: A(P0 u. (2. vi ) = F (vi ) − A(u0 . For 1 ≤ i ≤ p determine wi ∈ Vi satisfying: A(wi . Here g∗ = (I − P0 ) (P1 + · · · + Pp ) (I − P0 )u can be computed without explicit knowledge of u. by applying an additive Schwarz method in V0⊥ : (I − P0 ) (P1 + · · · + Pp ) (I − P0 )u = g∗ . The component (I − P0 )u ∈ V0⊥ can be sought. v0 ). we illustrate a Richardson iteration to solve (2. The forcing f∗ in (2.)-orthogonal decomposition. In the following. Then f∗ = Pˆ u = u0 + (w − u ˜0 ).

In the latter case. since χi (x) = 0 on B (i) . the algorithm can be applied either as an unaccelerated iteration or as a preconditioner. 3. Then. The balancing domain decomposition preconditioner for Schur complement matrices (in Chap.) solve the hybrid formulation for 1 ≤ i ≤ p: ⎧ ⎪ ⎪ −∇ · (a(x)∇wi ) + c(x) wi = f (x).19. Let w1 (.5 (Hybrid Schwarz-Richardson Iteration) Input: w(0) starting iterate and 0 < t1 < τ < t2 < p1 1.2.18. Let u(x) denote a solution to (2. it can also be motivated by a multisubdomain hybrid formulation of (2. the coarse space V0 may be constructed so that all the subdomain compatability conditions are simultaneously enforced in the orthogonal complement of V0 . CA17].12). 2.23) ⎪ ⎪ wi = j=i χj wj . Let c(x) ≥ c0 > 0 in (2. Using this. Endfor Remark 2. Suppose the following assumptions hold.)}pi=1 .64 2 Schwarz Iterative Algorithms Algorithm 2.12). For k = 0. on B (i) ⎪ ⎪ ⎩ wi = 0. on B[i] ∩ BD . on B[i] ∩ BN  (2. the following result will hold: ∗ u(x) = wi (x) on Ω i . · · · . wp (. Compute in parallel: w(k+1) ≡ w(k) + τ P˜ (u − w(k) ). MA17]. in Ωi∗ ⎪ ⎪ ⎨ n · (a∇wi ) + γ wi = gN . 3.4 Restricted Schwarz Algorithm The restricted Schwarz method is a variant of the additive Schwarz method employing a partition of unity. Formally. · · · . In such applications. certain compatibility conditions must be satisfied locally. Ωp∗ . 2. In it.   . · · · . In practice. · · · until convergence do: 2. KU6. the exact projections Pi are replaced by approximations which require the solution of Neumann boundary value problems on non-overlapping sub- domains Ωi . it yields a non-symmetric preconditioner even for self  adjoint problems. for 1 ≤ i ≤ p.12) based on a partition of unity χ1 (x). χp (x) subordinate to Ω1∗ . Proof.2. we note that j=i χj (x) = 1 on B (i) for 1 ≤ i ≤ p. 3) is based on this principle [MA14. Given the partition of unity {χi (. see [CA19. 15 for the case BN = ∅.). See Chap. we obtain the hybrid formulation. Theorem 2. 1. For each subdomain Neumann problem to be solvable.

2. vp ) = (w1 . (2.24) ⎪ ⎪ wi = j=i χj vj . · · · .15 to wi with w ≡ v to obtain: wi = v + Pi (u − v). j=i Substitute this into (2. the outputs wi satisfy: ⎧ ⎪ ⎪ −∇ · (a(x)∇wi ) + c(x) wi = f (x). p in parallel compute: . · · · until convergence do: 2.6 (Restricted Schwarz Method in Projection Form) (0) (0) p (0) Input: (w1 .25) i i i The following algorithm corresponds to a Picard iteration of the map T . the mapping T will be a contraction and the Picard iterates of T will converge to its fixed point (u1 . for 1 ≤ i ≤ p where u solves (2. At the fixed point of T where v = w. this yields:    w= χi wi = χi (w + Pi (u − w)) = w + χi Pi (u − w). the global approximation v(x) will satisfy:  v(x) = χj (x)vj (x).24) and apply Lemma 2. · · · . Under the assumption c(x) ≥ c0 > 0 and BN = ∅. Algorithm 2. · · · . 1. on B[i] ∩ BD . Given local approximations p (v1 .23) corresponds to a fixed point equation for the following linear mapping T defined by: T (v1 . wp ) where for vi satisfying vi = 0 on B[i] ∩ BD and n · (a∇vi ) + γ vi = gN on B[i] ∩ BN for 1 ≤ i ≤ p. on Ωi∗ . For k = 0. on B (i) ⎪ ⎪ ⎩ wi = 0. · · · . on each B (i) . · · · . vp ) define a global approximation v ≡ j=1 χj vj . on Ωi∗ ⎪ ⎪ ⎨ n · (a∇wi ) + γ wi = gN . Since χi (x) = 0 for x ∈ B (i) . on B[i] ∩ BN  (2.13).2. wp ) and w(0) (x) ≡ j=1 χj (x)wj (x) 1. For i = 1. up ) where ui ≡ u on each subdomain Ωi∗. · · · .2 Projection Formulation of Schwarz Algorithms 65 The hybrid formulation (2.

(k+1) wi ≡ Pi u − w(k) . 3. Define: w(k+1) (x) ≡ w(k) (x) + i=1 χi (x)wi (x). Endfor . 5. Endfor p (k+1) 4.

w ∈ V (2.Ω denotes the maximum norm. for 0 ≤ i ≤ p. TA8. Our formulation will employ the finite dimensional linear space V = IRn . for problems of the form (2.28) We shall formulate matrix Schwarz algorithms to solve this system by analogy with the projection algorithms described in Chap. 2. Consider the finite dimensional space V ≡ IRn endowed with a self adjoint and coercive bilinear form A(.66 2 Schwarz Iterative Algorithms Under appropriate assumptions.. w) ≡ vT Aw. we shall seek u ∈ V such that: ⎧ ⎨ A(u. 15 when BN = ∅. for v ∈ V. The convergence of the preconditioner associated with the preceding algorithm can also be improved significantly if a coarse space projection term is employed additively. which also defines an inner product on V . we shall describe the matrix version of Schwarz algorithms. . as the number of subdomains is increased and their diameters decrease in size.27) ⎪ ⎩ F (v) ≡ vT f .). for v ∈ V. BR18.3 Matrix Form of Schwarz Subspace Algorithms In this section. v) = F (v). The matrix form of preconditioner associated with the restricted Schwarz method is described in Chap. for v. MA37. We shall further assume that we are given subspaces Vi ⊂ V for 0 ≤ i ≤ p satisfying: V = V0 + V1 + · · · + Vp .). XU3.Ω . 2. (2. problem (2. T will be a contraction and the iterates w(k) will converge geometrically to the solution u of (2. Given a linear functional F (·).. Consequently. Remark 2. where  · ∞. where A is an n × n symmetric and positive definite matrix and f ∈ IRn . 2. see Chap. endowed with a self adjoint and coercive bilinear form A(. see [MI. (2. and that it is the column space (Range) of an n × ni matrix RiT of full rank: Vi ≡ Range RiT . .27) will correspond to the linear system: Au = f . the rate of convergence of the algorithm can deteriorate. . We shall assume that each Vi ⊂ IRn is of dimension ni . GR4].13). DR11.20.2.Ω ≤ δ k w(0) − u∞.12): w(k) − u∞. The preceding restricted Schwarz algorithm did not employ coarse space residual correction.3.26) In this case. where ⎪ A(v. matrix expressions can be derived for the projection version of the Schwarz algorithms described in the preceding section. In matrix terms.

wi ) = A(v. Given v ∈ V . This requires that given v ∈ V . can be obtained . yi ∈ IRni .22.26). we obtain that Ai xi = Ri Av. (n0 + · · · + nl )} . we define Pi v ∈ Vi : A(Pi v. The matrices Ri will be referred to as restriction maps while their transposes RiT will be referred to as extension maps. Remark 2. .24.3 to solve system (2. and substi- tuting Pi v = RiT xi results in the expression: Pi = RiT A−1 i Ri A (2.. . 2. In particular. ∀yi ∈ IRni . Remark 2. Since this must hold for all yi ∈ IRni . Remark 2. If the rows and columns of matrix Ri are elementary vectors. Instead. Matrix versions of Schwarz algorithms to solve (2. the columns of RiT must form a basis for Vi . The matrix version of Alg.30) for the matrix representation of Pi .28) instead of problem (2. then matrix Ai = Ri ARiT will correspond to principal submatrices of A.3 Matrix Form of Schwarz Subspace Algorithms 67 Thus. Solving this linear system yields xi = A−1 i Ri Av.28) based on the subspaces Vi can be ob- tained by transcribing the projection algorithms in terms of matrices. This will require a matrix representation of the projections Pi .21. . Matrix A−1 i should not be assembled. there must exist vi ∈ Vi satisfying: v = v0 + v1 + · · · + vp . then Al will correspond to the diagonal block of A with indices in Il . if (n0 + · · · + np ) = n and Rl corresponds to the rows of an identity matrix of size n with indices in Il : Il = {(n0 + · · · + nl−1 ) + 1.23.2.29) to obtain: yTi (Ri ARiT ) xi = yTi Ri A v. A matrix representation of Pi can be derived as follows. corresponding to selected columns or rows or some identity matrix of appro- priate size. Definition 2. . represent Pi v = RiT xi and wi = RiT yi for xi . We assume that Vi satis- fies (2. an expression wi = A−1 i ri can be computed by solving Ai wi = ri .29) as the A(. Multiplicative Schwarz Algorithm.)-orthogonal of v ∈ V onto Vi . An elementary rank argument will show that (n0 + n1 + · · · + np ) ≥ n.2). where Ai ≡ (Ri ARiT ). wi ) ∀wi ∈ Vi (2. Since Vi is the column space of RiT . Substitute these representations into (2. . 2.

Substituting the matrix form of projection Pi . by replacing (k+ i−1 i−1 each update Pi (u − w p ) ) by its discrete counterpart Pi u − w(k+ p ) .28). where u is the solution to (2.

68 2 Schwarz Iterative Algorithms and using that Au = f yields: ⎧ .

⎨ Pi (u − w) = RT A−1 Ri A u − w(k+ p ) i−1 i i .

⎩ i−1 T −1 = Ri Ai Ri f − Aw(k+ p ) . the matrix form of: i i−1 . Thus.

becomes: i−1 . i−1 w(k+ p ) = w(k+ p ) + Pi u − w(k+ p ) .

p do: i+1 . f 1. · · · until convergence do: 2. 1. (k+ i−1 + RiT A−1 i w(k+ p ) = w(k+ p ) i R i f − Aw p ) . Algorithm 2. The resulting multiplicative or sequential Schwarz algorithm is listed next.1 (Multiplicative Schwarz Method to Solve (2.3. For i = 0. · · · . For k = 0.28)) Input: w(0) = 0 (starting guess).

2 (Symmetrized Schwarz Preconditioner for (2. The inverse of the symmetrized Schwarz preconditioner M is described below. 1. p do: w ← w + RiT A−1 i Ri (r − Aw). its action on a vector should be computed by solution of the associated linear system. Endfor Output: w(k) The iterates w(k) in this algorithm will converge to the solution of (2. Instead. 1. 2.25. . A−1i should not be assembled. For instance.3. the computation of RiT A−1 i Ri f should first involve the computation of Ri f . · · · . Endfor Output: M −1 r ≡ w Remark 2. Endfor 4. · · · . 3.28)) Input: w ≡ 0 and r 1.28) without acceleration. In practice. followed by the solution of the linear system Ai vi = Ri f . 0. The notation A−1 i was only employed for convenience in the preceding algorithms. If CG acceleration is employed to solve Au = f . Scatter-gather operations can be used to implement RiT and Ri . followed by the computation RiT vi . w(k+ p+1 ) = w(k+ p+1 ) + RiT A−1 i i i Ri f − Aw (k+ p+1 ) . For i = p. then a symmetric positive definite preconditioner would be necessary [GO4]. Algorithm 2.

2. the matrices Ai = Ri ARiT can be replaced by appropriately chosen preconditioners A˜i = A˜Ti > 0. AX. SA2]. a sparse preconditioner A˜i for Ai can be obtained by ILU fac- torization of Ai .3 Matrix Form of Schwarz Subspace Algorithms 69 Remark 2.26. As an example. If approximations are employed in the multiplicative Schwarz . see [BE. In both of the preceding algorithms.

p. Algorithm 2. · · · . 1. i=0 This is summarized below. Endfor 3. Sum: w ≡ w0 + · · · + wp . · · · . Additive Schwarz Algorithm. see [XU3]. then it is easily seen that the matrix version of the additive Schwarz preconditioner corresponds to a block Jacobi . In step 1 of the preceding algorithm.28. For i = 0.3. p−1.31) for u corresponds to a preconditioned system of the form M −1 Au = M −1 f . When (n0 +n1 +· · ·+np ) = n and the columns of Rl correspond to selected columns of an identity matrix. 0. (2. 0. an alternative sym- metrization involving one additional fractional step can be used in the sym- metrized Schwarz preconditioner. p in parallel do: wi = RiT A−1 i Ri r 2. This yields the additive Schwarz preconditioner as:  p M −1 = RiT A−1 i Ri . · · · . p−1. to ensure convergence without acceleration. If a preconditioner is employed for A0 .28) has the form: !  p T −1 Ri Ai Ri A u = w∗ .3 (Additive Schwarz Preconditioner for (2. i=0 The system (2. Remark 2.28)) Input: r 1. Output: M −1 r ≡ w Remark 2. The matrix version of the additive Schwarz equation P u = f∗ for solution of (2.27. Both versions will be equivalent if an exact solver is employed for A0 . residual corrections can be implemented for i = p. 1. ˜−1 the condition λmax Ai Ai < 2 must be satisfied.31) i=0 where  p w∗ ≡ RiT A−1 i Ri f . method.

28)) Input: r 1. Next. Thus. seek it in the form u0 = R0T α0 for some unknown coefficient vector α0 ∈ IRn0 . p in parallel do: vi = RiT A−1 i Ri (r − Aw0 ). Compute: w0 = R0T A−1 0 R0 r. 2. . to verify that M −1 r will satisfy R0 AM −1 r = 0 whenever r ∈ IRn satisfies R0 r = 0. If the input residual r satisfies R0 r = 0. For i = 1. Sum: v = v1 + · · · + vp . so that R0 (f − Au0 ) = 0. Compute: v0 = R0T A−1 0 R0 Av. and the matrix version of the multiplicative Schwarz method corresponds to the block Gauss-Seidel method.70 2 Schwarz Iterative Algorithms preconditioner. Imposing the preceding constraint will yield: R0 (f − A u0 ) = 0 ⇔ R0 f − A R0T α0 = 0 ⇔ α0 = A−1 0 R0 f . 5. Then. yielding w0 = 0. 3. Algorithm 2. Note that to construct a starting iterate u0 ∈ IRn . This suggests choosing a starting iterate u0 ∈ IRn in the conjugate gradient method so that the initial residual r = f − A u0 satisfies R0 (f − A u0 ) = 0. the action M −1 of the inverse of preconditioner M can easily be deduced to be the following. as will be shown below.29.3.4 (Hybrid Schwarz Preconditioner for (2. 6. Hybrid Schwarz Method. Compute: w = w0 + v − v0 . As this problem represents the preconditioned system M −1 Au = M −1 f . · · · . where A0 = R0 AR0T . all subsequent residuals in the conjugate gradient method with hybrid Schwarz preconditioner will satisfy this constraint. Output: M −1 r ≡ w Remark 2. When (n0 + · · · + np ) > n or when the columns of Rl are not columns of an identity matrix. apply R0 A to step 6 in the hybrid Schwarz preconditioner with w0 = 0 to obtain: R0 AM −1 r = R0 Av − R0 AR0T A−1 0 R0 Av = 0. then step 1 in the hybrid Schwarz preconditioner can be skipped. Endfor 4. The matrix version of the hybrid Schwarz pre- conditioner can be derived from the hybrid Schwarz problem P˜ u = f∗ where Pˆ = P0 + (I − P0 ) (P1 + · · · + Pp ) (I − P0 ). then the multiplicative and additive Schwarz algorithms generalize the block Jacobi and block Gauss-Seidel algorithms. u0 = R0T A0−1 R0 f .

Ep form a discrete partition of unity relative to R1 . Restricted Schwarz Algorithm. . In the version given below.32. if αi is an ni × di matrix whose columns form a basis for the null space of A˜i . so that the subproblems in step 2 of the hybrid Schwarz preconditioner are well defined when Ai is replaced by A˜i . Substituting the definition of R0 yields that αTi Ri (r − Aw0 ) = 0 for 1 ≤ i ≤ p. if such can be found. where each Ei is an n × ni matrix for 1 ≤ i ≤ p. By construction of the term w0 in step 1 of the hybrid Schwarz preconditioner. it may even be advantageous to employ singular matrices A˜i whose null spaces are known. linear systems of the form A˜i vi = ri will be solvable only if a compatibility condition is satisfied. .31. In such ap- plications. with E0 ≡ R0T . then αTi ri = 0 must hold for solvability. 2. the projection term v − R0T A−10 R0 Av in step 6 modifies these arbitrary terms so that R0 AM −1 r = 0 holds.5.28) is motivated by (2. Rp if: E1 R1 + · · · + Ep Rp = I. This is the principle underlying the balancing domain decomposition preconditioner [MA14]. Remark 2. Each vi in step 2 of the hybrid Schwarz preconditioner can have an arbitrary additive term of the form RiT αi β i with β i ∈ IRdi . Let Vi = Range(RiT ) be subspaces of V = IRn for 1 ≤ i ≤ p. The action M −1 of the inverse of the restricted Schwarz preconditioner to solve (2. · · · . a coarse space correction term is included. and will involve an arbitrary additive term from the null space. Indeed. Define a coarse space V0 ⊂ IRn as: " # V0 ≡ Range R0T . it is shown that the hybrid Schwarz precondi- tioned matrix P˜ is better conditioned than its associated additive Schwarz preconditioned matrix P . and also effectively handle the arbitrariness of the local solutions. Then. RpT αp .25) when iterate w = 0. its general matrix version will require an algebraic partition of unity. it will hold that R0 (r − Aw0 ) = 0.30. The submatrices Ai = Ri ARiT in the hybrid Schwarz precondi- tioner may be replaced by approximations A˜i for 1 ≤ i ≤ p.2 is based on a partition of unity. . · · · . Remark 2. solve the linear system Av = f −Au0 by a conjugate gradient method with a hybrid Schwarz preconditioner in which step 1 is skipped. In this case. We say that matrices E1 . 2. In Chap. a careful choice of coarse space V0 in the hybrid Schwarz method can ensure solvability of all such local problems. where R0T ≡ R1T α1 . Since the restricted Schwarz algorithm in Chap. However. In certain appli- cations. . Definition 2. To determine v. . the computational costs in a conjugate gradient method to solve Au = f can be reduced by splitting the solution as u = u0 + v with u0 = R0T A−1 0 R0 f . the solution vi will not be unique.3 Matrix Form of Schwarz Subspace Algorithms 71 Thus. 2.

3 to solve a discretization of (2.1). and at most one nonzero entry per row or column. Ωp∗ of Ω. Then. it cannot be employed in a conjugate gradient method [CA19]. j) denote the global index of the j’th local ∗ node in Ωi ∪ BN ∩ B[i] . FA9.33. if index(Ωi∗ . Remark 2. ∗ Let ni = dim (V i ) and let index(Ωi . 2. p in parallel compute: wi = Ei A−1 i Ri r. The action of Ri and RiT for 1 ≤ i ≤ p . and given the finite element space Vh ⊂ HD 1 (Ω). 2. We shall also remark on local solvers and parallel software libraries. For i = 0. if index(Ωi∗ . an automated strategy may be employed. . · · · .4 Implementational Issues In this section. k) = j (Ri )kj = for 1 ≤ i ≤ p. . the number of subdomains p also depends on the number of processors.28)) Input: r. though the methodology (with the exception of a coarse space V0 ) will typically carry over for a finite difference discretization. define Ri as an ni × n restriction matrix: 1. . 1. Since the above preconditioner is not symmetric. using the graph partitioning algorithms discussed in Chap. 2. For simplicity. we define the local spaces as: $ ∗ % Vi ≡ Vh ∩ v ∈ H 1 (Ω) : v = 0 on Ω\Ω i for 1 ≤ i ≤ p. Ideally.72 2 Schwarz Iterative Algorithms Algorithm 2. so that the decomposition yields approximately balanced loads. SI2. 5. PO3. availability of fast solvers for subdomain problems and hetero- geneity in the coefficients.1 Choice of Subdomains and Subdomain Spaces Various factors may influence the choice of an overlapping decomposition Ω1∗ . FO2. see [BE14. Endfor Output: M −1 r ≡ α w0 + (1 − α) (w1 + · · · + wp ). we remark on applying the matrix Schwarz algorithms from Chap.5 (Restricted Schwarz Preconditioner for (2. 2. 1. BA20. Once a an overlapping decomposition {Ωl∗ }pl=1 has been chosen.3. k) = j. 0 < α < 1. we only consider a finite element discretization. 0. When a natural decomposition is not obvious. For 1 ≤ i ≤ p these matrices will have zero or one entries. These include the geometry of the domain.4. PO2]. regularity of the solution. .

2.4 Implementational Issues 73

may be implemented using scatter-gather operations and the data structure
of index(Ωi∗ , ·). The subdomain submatrices Ai of size ni × ni defined by:

Ai = Ri Ah RiT , for 1 ≤ i ≤ p,

will be principal submatrices of A corresponding to the subdomain indices.

2.4.2 Choice of Coarse Spaces

A coarse space V0 ⊂ (Vh ∩HD 1
(Ω)) may be employed as described in Chap. 2.1.
(0) (0)
If ψ1 (·), · · · , ψn0 (·) forms a finite element basis for V0 , then an extension
matrix R0T of size n × n0 will have the following entries:

T (0)
R0 ij = ψj (xi ), for 1 ≤ i ≤ n, 1 ≤ j ≤ n0 .

Matrix R0 will not be a zero-one matrix, unlike Ri for 1 ≤ i ≤ p. Furthermore,
A0 = R0 Ah R0T will not be a submatrix of A. In some applications, the coarse
space may be omitted, without adversely affecting the rate of convergence
of Schwarz algorithms. For instance, if c(x) ≥ c0 > 0 and coefficient a(x) is
anisotropic with a sufficiently small parameter and aligned subdomains, or for
a time stepped problem, with sufficiently small time step and large overlap.

Remark 2.34. When the boundary segment BD = ∅, equation (2.12) will have
a unique solution, and matrix A will be symmetric positive definite. However,
when BD = ∅ and c(x) = 0 and γ(x) = 0 then (2.12) will be a Neumann
problem. In this case, a compatability condition must be imposed for the
solvability of (2.1), and its solution will be unique only up to a constant. By
construction, all the subdomain matrices Ai will be nonsingular for 1 ≤ i ≤ p
since Dirichlet boundary conditions will be imposed on B (i) = ∅. However,
matrix A0 will be singular with 1 spanning its null space. To ensure that
each coarse problem of the form A0 v0 = R0 r is solvable, it must hold that
1T R0 r = 0. Then, the coarse solution will be nonunique, but a specific solution
may be selected so that either 1T v0 = 0, or 1T v = 0 for the global solution.

2.4.3 Discrete Partition of Unity

For the restricted Schwarz algorithm, an algebraic partition of unity consisting
of matrices Ei can be constructed as follows. Let χ1 (·), · · · , χp (·) denote a
continuous partition of unity subordinate to Ω1∗ , · · · , Ωp∗ . If x1 , · · · , xn denote
the nodes of Th (Ω) in Ω ∪ BN , define:

χi (xl ) if index(Ωi∗ , j) = l
(Ei )lj =
0 if index(Ωi∗ , j) = l

74 2 Schwarz Iterative Algorithms

Here 1 ≤ i ≤ p, 1 ≤ l ≤ n and 1 ≤ j ≤ ni . Then, by construction:

p
Ei Ri = I.
i=1

Similar discrete partitions of unity are employed in [MA17]. For the coarse
space, we formally define E0 ≡ R0T .

2.4.4 Convergence Rates

For discretizations of self adjoint and coercive elliptic equations, Schwarz al-
gorithms typically converge at a rate independent of (or mildly dependent
on) the mesh size h and the subdomain size h0 , provided the overlap between
subdomains is sufficiently large, and a coarse space V0 is employed with an
O(h0 ) approximation property. This is verified by both computational tests
and theoretical analysis. The latter typically assumes that the overlap between
subdomains is β h0 > 0 and shows that the rate of convergence can depend
on the coefficient a(.), and mildly on the parameter β, see Chap. 2.5.

2.4.5 Local Solvers

The implementation of Schwarz algorithms requires computing terms of the
form wi = A−1 i Ri r for multiple choices of Ri r. In practice, wi is obtained
by solving the associated system Ai wi = Ri r, using a direct or iterative
solver. Direct solvers are commonly employed, since they are robust and do
not involve double iteration. Furthermore, efficient sparse direct solvers are
available in software packages. In the following, we list several solvers.
Direct Solvers. Since Ai = ATi > 0 is sparse, a direct solver based on
Cholesky factorization can be employed [GO4, GE5, DU]. Matrix Ai its
Cholesky factorization Ai = Li LTi should be stored using a sparse format.
Systems of the form Ai wi = Ri r can then be solved using back substitution,
solving Li zi = Ri r and LTi wi = zi , see [GO4]. Such algorithms are available
in LAPACK, SPARSPAK and SPARSKIT, see [GE5, DU, GO4, SA2, AN].
Remark 2.35. The cost of employing a direct solver to solve Ai wi = Ri r de-
pends on the cost of computing its Cholesky factors Li and LTi , and the cost
for solving Li zi = Ri r and LTi wi = zi . When multiple systems of the form
Ai wi = Ri r need to be solved, the Cholesky factors of Ai need to be deter-
mined only once and stored. The cost of computing the Cholesky factorization
of Ai will depend on the sparsity of Ai , while the cost of solving Li zi = Ri r
and LTi wi = zi will depend on the sparsity of Li . These costs can be sig-
nificantly reduced by reordering (permuting) the unknowns. For instance, if
subdomain Ωi∗ is a thin strip, then a band solver can be efficient, provided
the unknowns are reordered within the strip so that the band size is mini-
mized. Other common orderings include the nested dissection ordering, and

2.4 Implementational Issues 75

the Cuthill-McKee and reverse Cuthill-McKee orderings, see [GE5, DU, SA2].
Sparse software packages such as SPARSPAK and SPARSKIT, typically em-
ploy graph theoretic methods to automate the choice of a reordering so that
the amount of fill in is approximately minimized, to reduce the cost of em-
ploying a direct solver [GE5, DU]. Such solvers typically have a complexity of
O(nαi ) for 1 < α < 3.

FFT Based Solvers. Fast direct solvers based on Fast Fourier Transforms
(FFT’s) may be available for special geometries, coefficients, triangulations
and boundary conditions, see [VA4]. Such solvers will apply when the eigen-
value decomposition Ai = Fi Λi FiT of Ai is known, where Λi is a diagonal
matrix of eigenvalues of Ai , and Fi is a discrete Fourier (or sine or cosine)
transform. Such solvers will typically have a complexity of O(ni log(ni )).
Iterative Solvers. Each subdomain problem Ai wi = ri may also be solved
iteratively using a CG algorithm with a preconditioner Mi (such as ILU,
Gauss-Seidel, Jacobi) in an inner loop. This will introduce double iteration.
To ensure convergence, the fixed number of local iterations must be accurate
to within the discretization error. If the number of iterations vary with each
application of the local solver, then the Schwarz preconditioner may vary with
each iteration, see [GO4, SA2, AX, SI3].
Remark 2.36. If an iterative local solver is employed, with fixed number of
iterations and zero starting guess, this will yield a preconditioner A˜i for Ai ,
see [GO4, BE2, NO2, AX, MA8]. To ensure the convergence of Schwarz algo-
rithms when approximate solvers are employed, matrices A˜i must satisfy cer-
tain assumptions. For instance, the condition number of the additive Schwarz
preconditioner with inexact solver will increase at most by the factor γ:

maxi λmax A˜−1 i Ai
γ≡

.
mini λmin A˜−1 i Ai

If inexact solvers A˜i are employed in

the multiplicative
Schwarz algorithm,
˜−1
then the spectral radius must satisfy ρ Ai Ai < 2 to ensure convergence. In
the hybrid Schwarz algorithm (in balancing domain decomposition [MA15])
the coarse problem must be solved exactly.

2.4.6 Parallelization and Software Libraries

With the exception of the sequential Schwarz algorithm without coloring, the
computations on different subdomains in a Schwarz algorithm can typically
be implemented concurrently. From the viewpoint of parallelization, Schwarz
algorithms thus have “coarse granularity”, i.e., a significant portion of the
computations can be performed in parallel, with the remaining portion re-
quiring more intensive communication between processors. As an example,

76 2 Schwarz Iterative Algorithms

consider the additive Schwarz preconditioner:


p
M −1 r = RlT A−1
l Rl r.
l=0

Suppose there are (p + 1) processors available, and that we assign one proces-
sor to each subproblem and distribute the data amongst the processors. Then,
the action of M −1 r can be computed as follows. First, given r, synchronize all
the processors and communicate relevant data between the processors, so that
processor l receives the data necessary to assemble Rl r from other processors.
Second, let each processor solve its assigned problem Al wl = Rl r in parallel.
Third, synchronize and communicate the local solution wl to other processors,
as needed (processor l = 0 should transfer Rl w0 to processor l, while proces-
sor l should transfer Rj RlT wl to processor j if Ωj∗ ∩ Ωl∗ = ∅). Fourth, let each
processor sum
relevant components and store the result locally (processor l
can sum Rl R0T w0 + R1T w1 + · · · + RpT wp ). For simplicity, processor 0 may
be kept idle in this step. Other Schwarz algorithms may be parallelized simi-
larly. The PETSc library contains parallelized codes in C, C++ and Fortran,
for implementing most Schwarz solvers, see [BA15, BA14, BA13]. These codes
employ MPI and LAPACK.
MPI. The message passing interface (MPI) is a library of routines for imple-
menting parallel tasks in C, C++ and Fortran, see [PA, GR15]. It is based on
the “message passing model”, which assumes that different processors have
separate memory addresses, and that data can be moved from one memory
address to another. Using MPI, a parallel computer architecture can be simu-
lated given a cluster of work stations connected by high speed communication
lines. Once the MPI library has been installed, the same executable code of
a parallel program employing the MPI library is stored and executed on each
processor. Each processor is assigned a label (or rank). If there are p proces-
sors, then processor l is assigned rank l. Since the same executable code is
to be run on each processor, parallelization is obtained by branching the pro-
grams based on the rank. The library employs protocol for synchronizing and
communicating data between the different processors. Readers are referred to
[PA, GR15] for details on the syntax, and for instructions on downloading and
installing MPI. In many domain decomposition applications, however, details
of MPI syntax may not be required if the PETSc parallel library is employed.
PETSc. The suite of routines called PETSc (Portable, Extensible Toolkit
for Scientific Computing) is a library of routines for implementing domain de-
composition iterative methods, optimization algorithms, and other algorithms
used in scientific computing. The PETSc library is available in C, C++ and
Fortran, but requires installation of the MPI and LAPACK libraries. Most
Schwarz and Schur complement solvers are implemented in PETSc, and are
coded to run on parallel computers. We refer to [BA14] for a tutorial on the
syntax for this library.

2.5 Theoretical Results 77

2.5 Theoretical Results
In this section, we describe theoretical results on the convergence of multi-
plicative, additive and hybrid Schwarz algorithms in an Hilbert space norm,
see [MA37, DR11, LI6, LI7, WI4, BR18, XU3]. We formulate an abstract
convergence theory for Schwarz projection algorithms on a finite dimensional
Hilbert space, where the convergence rate of the algorithms can be reduced
to two key parameters, which depend the properties of the subspaces under-
lying the projections. The theoretical framework admits replacement of exact
projections by approximations, in which case two additional parameters will
arise in the convergence bounds. We focus first on the abstract theory before
estimating the key parameters in applications to finite element discretizations
of self adjoint and coercive elliptic equations. Additional analysis of Schwarz
algorithms is presented in [ZH2, WA2, GR4, DR17, MA15].
Our discussion will be organized as follows. In Chap. 2.5.1 we present
background and notation. Chap. 2.5.2 presents the abstract Schwarz conver-
gence theory. Applications to finite element discretizations of elliptic equa-
tions are considered in Chap. 2.5.3. Our discussion follows [XU3, CH11]
where additional results may be found. Selected results on the convergence of
Schwarz algorithms in the maximum norm are presented in Chap. 15, see also
[FR7, FR8].

2.5.1 Background

Let V denote a Hilbert space equipped with inner product A(., .) and norm:

wV ≡ A(w, w)1/2 , ∀w ∈ V.

We consider the following problem. Find u ∈ V satisfying:

A(u, v) = F (v), ∀v ∈ V, (2.32)

where F (·) is a bounded linear functional on V . The solution to (2.32) will be
sought by Schwarz algorithms based on (p + 1) subspaces V0 , · · · , Vp of V :

V = V0 + V1 + · · · + Vp ,

i.e., for each v ∈ V we can find vi ∈ Vi such that

v = v0 + · · · + vp .

On each Vk , let Ak : Vk × Vk → IR be a symmetric, bilinear form defined as:

Ak (v, w) ≡ A(v, w), ∀v, w ∈ Vk .

If inexact projections (or solvers) are employed in the Schwarz algorithms, we
let A˜k : Vk × Vk → IR denote a symmetric, bilinear form corresponding to the
inexact solver for the projection onto Vk .

78 2 Schwarz Iterative Algorithms

Remark 2.37. We assume there exists parameters 0 < ω0 ≤ ω1 such that:
Ak (v, v)
ω0 ≤ ≤ ω1 , ∀v ∈ Vk \{0} (2.33)
A˜k (v, v)

for 0 ≤ k ≤ p. If A˜k (·, ·) ≡ Ak (·, ·) for 0 ≤ k ≤ p we obtain ω0 = ω1 = 1.
Remark 2.38. If V is finite dimensional, by employing basis vectors for V and
Vk , we may represent the bilinear forms A(·, ·), Ak (·, ·) and A˜k (·, ·) in terms
of matrices A, Ak and A˜k , respectively. Indeed, suppose n and nk denote
the dimensions of V and Vk , respectively, and let φ1 , . . . , φn be a basis for
(k) (k)
V and ψ1 , · · · , ψnk a basis for Vk . Define an n × n matrix A and nk × nk
matrices Ak and A˜k with entries (A)ij = A(φi , φj ) for 1 ≤ i, j ≤ n, and

(k) (k) (k) (k)
(Ak ) = Ak (ψ , ψ ) and A˜k
ij i j = A˜k (ψ , ψ ) for 1 ≤ i, j ≤ nk .
i j
ij
Matrix Ak may be obtained from matrix A as follows. Denote by RkT an n×nk
extension matrix whose i’th column consists of the coefficients obtained when
(k)
expanding ψi in the basis φ1 , · · · , φn for V :

(k)

n


ψi = RkT ji
φj , for 0 ≤ k ≤ p.
j=1

Substituting this into the definition of Ak above, yields:
!
(k) (k)

n


n



(Ak )ij = Ak (ψi , ψj ) =A RkT li φl , RkT qj φq = Rk ARkT ij .
l=1 q=1

nk (k)
Thus Ak = Rk ARkT . Substituting v = j=1 (v)j ψj into (2.33) yields:

vT Ak v
ω0 ≤ ≤ ω1 , ∀v ∈ IRnk \{0}.
vT A˜k v
This yields:

ω0 = min λmin A˜−1
k Ak ≤ max λmax A˜−1
k Ak = ω 1 ,
k k

corresponding to uniform lower and upper bounds for the spectra of A˜−1
k Ak .

Remark 2.39. In applications to elliptic equation (2.12) with BN = ∅, the
Hilbert space V = H01 (Ω) and Vk = H01 (Ωk∗ ) for 1 ≤ k ≤ p, the forms are:

A(u, v) ≡ Ω (a(x)∇u · ∇v + c(x)uv) dx, for u, v ∈ V

Ak (u, v) ≡ Ω ∗ (a(x)∇u · ∇v + c(x)uv) dx, for u, v ∈ Vk .
k

A simple approximation A˜k (·, ·) of Ak (·, ·) can be obtained by replacing the
variable coefficients a(.) and c(.) by their values at an interior point xk ∈ Ωk∗ .

2.5 Theoretical Results 79

This can be particularly useful if Ωk∗ is a rectangular domain with a uniform
grid, in which case fast solvers can be formulated for A˜k :

A˜k (u, v) ≡ (a(xk )∇u · ∇v + c(xk )uv) dx, for u, v ∈ Vk .
Ωk∗

Provided a(·) and c(·) do not have large variation in Ωk∗ then ω0 and ω1 will
correspond to uniform lower and upper bounds for a(x a(x)
k)
c(x)
and c(x k)
in Ωk∗ . In
applications, A˜k can be any scaled preconditioner for Ak , such as ILU.

We now define a projection map Pk : V → Vk and its approximation
P˜k : V → Vk for 0 ≤ k ≤ p as follows.
Definition 2.40. Given u, w ∈ V , we define Pk u and P˜k w as the unique
elements of Vk satisfying:

Ak (Pk u, v) = A(u, v), for all v ∈ Vk
A˜k (P˜k w, v) = A(w, v), for all v ∈ Vk .

The existence of Pk and P˜k follows by the Lax-Milgram lemma, see [CI2].
The following properties of Pk and P˜k will be employed in this section.
Lemma 2.41. Let Pk and P˜k be as defined above. The following hold.
˜ k of P˜k are given by:
1. The matrix representations Pk of Pk and P

Pk = RkT A−1 ˜ T ˜−1
k Rk A and Pk = Rk Ak Rk A.

2. The mappings Pk and P˜k are symmetric, positive semidefinite in A(·, ·):

A(Pk v, w) = A(v, Pk w), for v, w, ∈ V
A(P˜k v, w) = A(v, P˜k w), for v, w, ∈ V

with A(Pk v, v) ≥ 0 and A(P˜k v, v) ≥ 0 for v ∈ V . In matrix terms, this
corresponds to APk = PTk A, AP ˜k = P˜ T A, vT APk v ≥ 0, vT AP˜ k v ≥ 0.
k
3. The projections Pk satisfy:

Pk Pk = Pk , Pk (I − Pk ) = 0 and Pk V ≤ 1.

4. The map P˜k satisfies P˜k V ≤ ω1 and also:

ω0 A(Pk u, u) ≤ A(P˜k u, u), for all u ∈ V
A(P˜k u, P˜k u) ≤ ω1 A(P˜k u, u), for all u ∈ V.

80 2 Schwarz Iterative Algorithms

Proof. Properties of orthogonal projections Pk are standard, see [ST13, LA10].
The symmetry of P˜k in A(·, ·) may be verified by employing the definition of
P˜k and using that P˜k u, P˜k v ∈ Vk for all u, v ∈ V :
A(P˜k u, v) = A(v, P˜k u) = A˜k (P˜k v, P˜k u) = A˜k (P˜k u, P˜k v) = A(u, P˜k v).
The positive semi-definiteness of P˜k in A(·, ·) follows since:
0 ≤ A˜k (P˜k v, P˜k v) = A(v, P˜k v), ∀v ∈ V.
To obtain P˜k V ≤ ω1 , apply the definition of P˜k and employ (2.33):

P˜k u2V = A(P˜k u, P˜k u) = Ak (P˜k u, P˜k u)
≤ ω1 A˜k (P˜k u, P˜k u)
= ω1 A(u, P˜k u)
≤ ω1 uV P˜k uV .

The desired bound follows. To verify the bound on A(P˜k u, u), employ the
˜ k u of Pk u and P˜k u, respectively to obtain:
matrix equivalents Pk u and P
A(Pk u, u) = uT APk u = uT ARkT A−1
k Rk Au
T ˜−1
≤ u AR A Rk Au
1 T
ω0 k k
˜
ω0 A(P u, u).
1
=
Here, we have employed the property of symmetric positive definite matrices:
vT Ak v 1 vT A−1
k v 1
ω0 ≤ ≤ ω1 ∀v = 0 ⇔ ≥ −1
≥ ∀v = 0.
v A˜k v
T ω0 v A˜k v
T ω1

To verify that A(P˜k u, P˜k u) ≤ ω1 A(P˜k u, u) consider:

A(P˜k u, u) = A(u, P˜k u) = A˜k (P˜k u, P˜k u)
≥ 1 Ak (P˜k u, P˜k u)
ω1
˜ ˜
ω1 A(Pk u, Pk u),
1
=

where we have employed the definition of P˜k u, and property (2.33), and the
definition of ak (·, ·). This yields the desired result. 

In the following, we shall derive properties of different Schwarz algorithms
in terms of the mappings Pk or P˜k , which will be used later.
Classical (Multiplicative) Schwarz Algorithm. Each sweep of the clas-
sical Schwarz algorithm to solve (2.32) based on subspaces V0 , · · · , Vp has the
following representation in terms of projections (or its approximations):

⎪ 0, · · · , p do
⎨ For i = i+1

i i
u (k+ p+1 )
= u(k+ p+1 ) + P˜i u − u(k+ p+1 )


Endfor

2. · · · . subtracting this from the above yields: ⎧ ⎪ For i = 0.5 Theoretical Results 81 Since the solution u trivially satisfies u = u + P˜i (u − u) for 0 ≤ i ≤ p. p do ⎪ ⎪ .

⎪ ⎨ i+1 i i u − u(k+ p+1 ) = u − u(k+ p+1 ) − P˜i u − u(k+ p+1 ) .

• The inverse M −1 of the additive Schwarz preconditioner satisfies:  p M −1 A = RiT A˜−1 i Ri A.42. If M −1 denotes the matrix action corresponding to one sweep of the unsymmetrized Schwarz Alg.. By (2.28).34) which expresses the error u − u(k+1) in terms of the error u − u(k) . This will be demonstrated later in this section.) inner product.1 to solve (2. . λmin (M A) λmin (P˜ ) and it will be estimated later in this section. Hybrid and Symmetrized Schwarz Preconditioners. in the A(. it would follow that the Schwarz vector iterates u(k) will converge ˜ p ) · · · (I − P to u if (I − P ˜ 0 )V ≤ δ for some 0 ≤ δ < 1. and will be shown to be coercive. we express the preconditioned matrices M −1 A corresponding to the additive. Remark 2. hybrid and symmetrized multiplicative Schwarz preconditioners with ˜ k. This may also be expressed as: ˜ ≡ p P ˜ −1 P i=0 i = M A. Next. This is referred to as the error propagation map or the error amplification map. (2. A) ≡ −1 = . in matrix form ˜  p ˜ P ≡ Pi . i=0 where an inexact solver A˜k was assumed.3.34). inexact local solvers A˜k in terms of the matrices P Additive. Its condition number satisfies: λmax (M −1 A) λmax (P˜ ) cond(M. then we obtain: I − M −1 A = (I − P ˜ p ) · · · (I − P ˜ 0 ). 2. i=0 in operator form where P˜ is self adjoint. ⎪ ⎪ − ˜ − i (k+ p+1 ) ⎪ ⎪ = (I P i ) u u ⎩ Endfor Recursive application of the above yields the following expression: (u − u(k+1) ) = (I − P˜p ) · · · (I − P˜0 )(u − u(k) ). The iterates u(k) of the multiplicative Schwarz algorithm will converge to the desired solution u in the energy norm  · V if (I − P˜p ) · · · (I − P˜0 )V < 1. .

82 2 Schwarz Iterative Algorithms • The inverse M −1 of the hybrid Schwarz preconditioner satisfies: ⎧ ⎪ ⎪ M −1 A ⎪ ⎪  ⎨ ≡ R0T A−1 R0 A + (I − R0T A−1 R0 )( p RiT A˜−1 Ri )(I − AR0T A−1 R0 )A 0 .

the local matrices Ai were replaced by approximations A˜i for 1 ≤ i ≤ p. We obtain: . 0 p ˜ i=1 i 0 ⎪ ⎪ = P0 + (I − P0 ) P0 + i=1 Pi (I − P0 ) ⎪ ⎪ ⎩ ˜ (I − P0 ) . = P0 + (I − P0 ) P where P ˜ ≡ P0 + P ˜1 + ··· + P ˜ p . However. Here. the coarse matrix A0 should not be approximated. to ensure that all iterates lie in V0⊥ .

A) ≡ = . λmax (M −1 A) λmax P0 + (I − P0 )P˜ (I − P0 ) cond(M.

Both symmetrizations will be equivalent if P ˜ 0 = P0 . λmin (M −1 A) λmin P0 + (I − P0 )P˜ (I − P0 ) where P˜ = P0 + P˜1 + · · · + P˜p represents the additive Schwarz operator. We will analyze the latter. • The symmetrized Schwarz preconditioner M satisfies: M −1 A ≡ I − (I − P ˜ p ) · · · (I − P1 )(I − P0 )(I − P ˜ 1 ) · · · (I − P ˜ p ). This will be shown to be better conditioned than P˜ . Here. Ep ≡ (I − P˜p ) · · · (I − P˜0 ). and Ep . denotes an .2 Convergence of Abstract Schwarz Algorithms Our study of the convergence of Schwarz algorithms will involve the study of the operator P˜ associated with the additive Schwarz method. each P˜i as defined earlier. ˜ 0 = P0 is employed. Schwarz convergence analysis will be based on bounds for the preceding. though the latter involves an extra residual correction on V0 . 2. then the If an approximate coarse space projection P ˜ following alternative symmetrization A may also be employed: A˜−1 A ≡ I − (I − P ˜ p ) · · · (I − P1 )(I − P ˜ 0 )(I − P ˜ 0 )(I − P ˜ 1 ) · · · (I − P ˜ p ). the error propagation map of the multiplicative Schwarz method: P˜ ≡ P˜0 + · · · + P˜p . .5.

approximation .

of the projection Pi onto the subspace Vi . The spectra λmin P˜ and λmax P˜ of the A(·. ·)-self adjoint operator P˜ and the norm Ep V of the error propagation map Ep will be estimated. These quantities will generally depend on two parameters .

ω0 Proof. and the approximate solvers A˜i for 0 ≤ i ≤ p. Then. w). wi ) ≤ Ai (wi . wi ). 1. w). Vp and the forms A˜0 (. · · · . . wi ) ≤ C0 A(w. i=0 Substituting ω0 A˜i (wi .44. . . We associate a parameter K0 > 0 with the spaces V0 . .   . A˜p (. In matrix form. Estimates of K0 and K1 will be described later in this section for a finite element discretization of a self adjoint and coercive elliptic equation and will also depend on the parameters ω0 and ω1 . Let ω0 > 0 be defined by (2. i=0 The following result reduces the estimation of K0 to a parameter C0 in [LI6]. i=0 2. i=0 Remark 2. .. . . Lemma 2.. there exists wi ∈ IRni for 0 ≤ i ≤ p such that: w = R0T w0 + · · · + RpT wp . w). Definition 2.5 Theoretical Results 83 K0 and K1 associated with the subspaces V0 . . By assumption:  p Ai (wi . yields the desired result. .). wi ) ≤ C0 A(w. Vp . for 0 ≤ i ≤ p. and  p wTi A˜i wi ≤ K0 wT Aw.33). the following estimate will hold: C0 K0 ≤ . Let C0 > 0 be a parameter such that for each w ∈ V there exists wi ∈ Vi for 0 ≤ i ≤ p satisfying w = w0 + · · · + wp and:  p Ai (wi . the above may be stated that given w ∈ IRn .45. wi ) ≤ K0 A(w.) if for each w ∈ V there exists wi ∈ Vi : w = w0 + · · · + wp and satisfying the bound:  p A˜i (wi . 2. Suppose the following assumptions hold. in the above. .43.

w0 . · · · . · · · . j) : 0 ≤ i ≤ p. 0 ≤ j ≤ p} . the following holds: !1/2 ⎛ p ⎞1/2  . Let K1 > 0 be a parameter such that for all choices of v0 .84 2 Schwarz Iterative Algorithms Definition 2.46. vp . wp ∈ V and for any collection I of subindices: I ⊂ {(i.

 p .

 .

In matrix terms. wp and indices I the following holds:  vTi ARiT A˜−1 T ˜−1 i Ri ARj Aj Rj Awj (i. wj ⎠ . · · · . the preceding requires that for all choices of v0 . w0 .47. P˜j wj ≤ K1 A P˜i vi . · · · .j)∈I i=0 j=0 Remark 2.j)∈I . A P˜i vi . vi ⎝ A P˜j wj . vp . (i.

p 1/2 .

For each index pair i. then ij = 0. each vector in Vi is orthogonal to each vector in Vj . Let parameter ω1 be as defined earlier. i=0 i j=0 j Here we denote the norm xi 2A˜−1 = xTi A˜−1 ˜ ˜T i xi for Ai = Ai > 0. If ij < 1 the above is called a strengthened Cauchy-Schwartz inequality. wj )1/2 . Let matrix E be as defined earlier.49.p 1/2 ≤ K1 Ri Avi 2A˜−1 Rj Awj 2A˜−1 . i The parameter K1 can be estimated in terms of ω1 and the spectral radius ρ (E) of a matrix E = (ij ). while if Vi and Vj share at least one nontrivial vector in common. if the subspaces are orthogonal. wj ∈ Vj . Remark 2. j ≤ p. Then the following estimate will hold: K1 ≤ ω1 ρ (E) . Lemma 2. whose entries ij are strengthened Cauchy-Schwartz inequality parameters associated with each pair of subspaces Vi and Vj .e. Parameter ij represents the maximum modulus of the cosine of the angle between all pairs of vectors in subspace Vi and Vj . 1. Matrix E ≡ (ij ) for 0 ≤ i. wi )1/2 A(wj . p} define the parameters 0 ≤ ij ≤ 1 as the smallest possible coefficient satisfying: A(wi . Suppose the following assumptions hold. ∀wi ∈ Vi . Definition 2. .50. j ∈ {0.. i. In particular. · · · . then ij = 1. wj ) ≤ ij A(wi .48. 2.

i=0 j=0 For additional details. We apply lemma 2. if V0 is employed. If a coarse space V0 is not employed.   The following result describes alternative bounds for K1 . we estimate K1 as follows. 2. j) ∈ I : i = 0. Given an index set I ⊂ {(i. since ρ(E)˜ ≤ E˜ ∞ = l0 . See [XU3. · · · .35) 1≤i≤p j=1 Then the following estimate will hold:  ω 1 l0 .j)∈I  ≤ ij A(P˜i vi . P˜j wj ) (i. TO10]. Lemma 2. Suppose the following assumptions hold. . P˜j wj )1/2 (i.j  ≤ ij ω1 A(P˜i vi . wj )1/2 i. j ≤ p. 2. I01 . P˜i vi )1/2 A(P˜j wj . Denote by l0 ⎛ ⎞  p l0 ≡ max ⎝ ij ⎠. I11 as follows: ⎧ ⎪ ⎪ I00 ≡ {(i. Applying the strengthened Schwartz inequalities pairwise yields:  A(P˜i vi . if V0 is not employed. j) ∈ I : 1 ≤ i ≤ p. j = 0} ⎪ ⎩ I11 ≡ {(i. 1 ≤ j ≤ p}. see [XU3].5 Theoretical Results 85 Proof. 3. j = 0} ⎪ ⎨ I ≡ {(i.j)∈I  ≤ ij A(P˜i vi . j) ∈ I : i = 0. Vp denote subspaces of V . P˜j wj )1/2 i. 1 ≤ j ≤ p} 01 ⎪ ⎪ I10 ≡ {(i. j ≤ p} define I00 . I10 . vi )1/2 A(P˜j wj . if V0 is not employed K1 ≤ ω1 (l0 + 1). j) : 0 ≤ i. let E˜ be defined by E˜ij ≡ ij for 1 ≤ i. Let E = (ij ) denote the strengthened Cauchy-Schwartz parameters which are associated with the subspaces Vi and Vj for 0 ≤ i. Proof. If a coarse space V0 is employed. Let V0 .j p p ≤ ω1 ρ(E) ( A(P˜i vi . j) ∈ I : 1 ≤ i ≤ p. P˜i vi )1/2 A(P˜j wj . 1.51. wj ))1/2 . vi ))1/2 ( A(P˜j wj . j ≤ p.50 to estimate K1 as: K 1 ≤ ω 1 l0 . (2.

0)∈I10 A(P˜i vi . P˜0 w0 ))2   ≤ A( i:(i. P˜j wj ))2 = ( j:(0.j)∈I11 a(P˜i vi . i:(i.j)∈I A(P˜i vi . v0 ) A( j:(0. i:(i. P˜0 v0 ) A( j:(0.50 yields:  ( (i. P˜0 w0 )   ≤ ω1 A( i:(i. P˜0 w0 ))2  = (A( j:(i.j)∈I01 P˜j wj . wi )). vi )) A(P˜0 w0 . Combining the preceding results using that I = I00 ∪ I01 ∪ I10 ∪ I11 yields:  ( (i. wj )) p ≤ ω12 l0 A(P˜0 v0 . wi ∈ Vi for 0 ≤ i ≤ p. u) K0−1 ≤ ≤ K1 . j:(0. This yields the desired bound for K1 ≤ ω1 (l0 + 1).j)∈I01 A(P˜0 v0 . u) . v0 ) ( j=0 A(P˜j wj . wj )). vi )) ( j=0 A(P˜i wi . Applying Lemma 2.0)∈I10 P˜i vi . v0 ) ( j:(0. Next.j)∈I01 A(P˜j wj .j)∈I01 P˜j wj .0)∈I10 P˜i vi . Similarly. w0 )  p p ( (i.86 2 Schwarz Iterative Algorithms Let vi . u = 0. as also the eigenvalues of P˜ . vi ))( j=1 A(P˜j wj . j:(0. P˜j wj ))2  = (A(P˜0 v0 .j)∈I10 A(P˜i vi .j)∈I01 P˜j wj ))2   ≤ A(P˜0 v0 . P˜j wj ))2 p p ≤ ω12 (1 + 2l0 + l02 ) ( i=0 A(P˜i vi . P˜j wj ))2 = ( j:(i.0)∈I10 P˜i vi . P˜j wj ))2 ≤ ω12 A(P˜0 v0 .   We now estimate p the condition number of the additive Schwarz operator M −1 A = P˜ = i=0 P˜i . w0 ) p ≤ ω12 l0 ( i=0 A(P˜i vi .j)∈I01 P˜j wj )  ≤ ω12 l0 A(P˜0 v0 . vi )) ( j=0 A(P˜i wi . consider the sum over index set I01 :   ( (i. v0 ) A(P˜0 w0 . Since each P˜i is symmetric in the A(. The condition number of P˜ will be a quotient of the maximal and minimal eigenvalues of P˜ .) inner product. wj )). A(u.0)∈I10 P˜i vi ) A(P˜0 w0 . j:(0. we obtain for the sum over index set I10 :   ( (i. and will satisfy the following Rayleigh quotient bounds: A(P˜ u. P˜j wj ))2 ≤ ω12 l02 ( i=1 A(P˜i vi . w0 )  ≤ ω12 l0 ( i:(i. its eigenvalues will be real. wi )) 2 p p = ω12 (1 + l0 ) ( i=0 A(P˜i vi .j)∈I01 P˜j wj )   ≤ ω1 A(P˜0 v0 .j)∈I00 A(P˜i vi .0)∈I10 A(P˜i vi ..j)∈I01 A(P˜i vi . .0)∈I10 P˜i vi ) A(P˜0 w0 . vi )) A(P˜0 w0 . w0 ).

2.52. The following bounds will hold for the spectra of P˜ : .5 Theoretical Results 87 Theorem 2.

.

Proof. expand P˜ v2V . K0−1 ≤ λmin P˜ ≤ λmax P˜ ≤ K1 . and apply the definition of K1 : . For an upper bound.

 p .

P˜j v . P˜ v = i=0 j=0 A P˜i v. p P˜ v2V = A P˜ v.

 1/2 .

v) ≤ K1 i=0 A( P j=0 A(P . 1/2 p ˜i v. v) p ˜j v.

˜ = K1 A P v. v) and simplify using the definition of P˜i and the Cauchy-Schwartz inequality: p p . For a lower bound. v ≤ K1 P˜ vV vV . The upper bound P˜ vV ≤ K1 vV thus follows immediately. choose v ∈ V and expand v = v0 + · · · + vp employing the decomposition guaranteed by definition of K0 . Substitute this into A (v.

vi ) = i=0 A˜i P˜i v. vi p . A (v. v) = i=0 A (v.

vi ) p . 1/2 ≤ i=0 A˜i P˜i v. P˜i v A˜i (vi .

1/2 = i=0 A v. vi ) p . P˜i v A˜i (vi .

 1/2 p ≤ ( i=0 A(v. vi ) . P˜i v))1/2 ˜ i=0 Ai (vi .

v)1/2 ˜ i=0 Ai (vi .e. . If subspaces V0 . A˜k = Ak for all k). A) = ≤ K0 K1 . In this case the additive Schwarz preconditioned system will have condition number of 1 and the conjugate gradient method will converge in a single iteration. Vp form an orthogonal decomposition of V and exact solvers are employed (i. 1/2 We thus obtain vV ≤ K0 A(P˜ v. v). then it is easily verified that K0 = K1 = 1. v)1/2 . · · · . which is a lower bound for the spectrum of P˜ . TO10].   Remark 2. v) ≤ K0 A(P˜ v. Squaring both sides yields: v2V = A(v.53.. 1/2 p = A(P˜ v. Remark 2. See [XU3. λmin (P˜ ) which is a bound for the condition number of M −1 A = P˜ . vi ) 1/2 ≤ A(P˜ v. Combining the upper and lower bounds together yields: λmax (P˜ ) cond(M.54. v)1/2 K0 vV .

the definition of P˜i and that P˜i P˜ −1 v ∈ Vi yields: p ˜ p ˜ . For 0 ≤ i ≤ p define: vi ≡ P˜i P˜ −1 v. i=0 i=0 For this decomposition. 2.55. By construction  p  p vi = P˜i P˜ −1 v = P˜ P˜ −1 v = v. then: Lemma 2. K 0 Proof.52 yields: 0 < K −1 ≤ λmin P˜ . P˜ is invertible and given v ∈ V we may construct an optimal partition. ˆ 0 is the smallest admissible choice of parameter K0 . Thm. For any choice of admissible parameter K0 . 0 Thus.88 2 Schwarz Iterative Algorithms The following result concerns the optimal choice of parameter K0 . If K ˆ −1 = λmin P˜ .

˜ ˜ −1 ˜ ˜ −1 i=0 A i (vi . Pi P v p . vi ) = i=0 Ai Pi P v.

P˜i P˜ −1 v . = i=0 A P˜ −1 v.

p = A P˜ −1 v. i=0 P˜i P˜ −1 v .

= A P˜ −1 v. P˜ P˜ −1 v .

v) .56. Define: P˜ ≡ P0 + P˜1 + · · · + P˜p P˜∗ ≡ P0 + (I − P0 )P˜ (I − P0 ). Let K0 and K1 be as defined above.   The following result shows that the hybrid Schwarz preconditioner P˜∗ is better conditioned than the associated additive Schwarz preconditioner. = A P˜ −1 v. the spectra of P˜∗ will satisfy: . Lemma 2. K0 = 1 λmin (P˜ ) is an admissible parameter. Then. v ≤ 1 λmin (P˜ ) A (v. Thus.

.

.

.

In particular. . K0−1 ≤ λmin P˜ ≤ λmin P˜∗ ≤ λmax P˜∗ ≤ λmax P˜ ≤ K1 . κ2 (P˜∗ ) ≤ κ2 (P˜ ).

. Expand the terms in the Rayleigh quotient associated with P˜∗ as: A(P˜∗ u. P0 u) + A(P˜ (I − P0 )u. Since the range of (I − P0 ) is V0⊥ . .5 Theoretical Results 89 Proof. u) A(P0 u. u) A(P0 u. the Rayleigh quotient associated with the self adjoint operator (I − P0 )P˜ (I − P0 ) will satisfy: .)-orthogonality of the decomposition u = P0 u + (I − P0 )u. 2. (I − P0 )u) employing the A(. (I − P0 )u) = . P0 u) + A((I − P0 )u. a subspace of V . A(u.

Remark 2. we consider the latter sweep. p − 1. In our analysis. Let M be the symmetrized multiplicative Schwarz preconditioner with: I − M −1 A = ETp Ep . 1. u∈V0 \{0} (A((I − P0 )u. If inexact solvers are employed. Lemma 2. For some 0 ≤ δ < 1 let Ep = (I − P˜0 ) · · · (I − P˜p ) satisfy: Ep V ≤ δ.58. p − 1. Both define w = 0 initially and define M −1 f ≡ w at the end of the sweeps. · · · . (I − P0 )u) ⊥ since the extrema are considered on a subspace of V . 1. the error map associated with the multiplicative Schwarz method. ⎧ ⎨ For k = p. (I − P0 )u) and A(P˜ (I − P0 )u. there are two alternative pos- sibilities for symmetrizing Schwarz sweeps. · · · . 0. then both sweeps will be mathematically equivalent. . The first symmetrization is: ⎧ ⎨ For k = p. · · · . 0. 2. p do w ← w + RkT A˜−1 k Rk (f − Aw) ⎩ Endfor An alternative symmetrization has an additional fractional step for k = 0. ˜ 0 ) · · · (I − P where Ep = (I − P ˜ p ) is the matrix equivalent of Ep . 1. 1. p − 1. 1. · · · . u∈V0⊥ \{0} (A((I − P0 )u. (I − P0 )u) λmin P˜ ≤ min . Suppose the following assumptions hold. Bounds for Ep V directly yield convergence rates for the multiplicative Schwarz method and condition number estimates for the symmetrized Schwarz preconditioner. Substituting these ob- servations in the Rayleigh quotient yields the desired result. A(P˜ (I − P0 )u. p − 1. (I − P0 )u) max ≤ λmax P˜ .   We next consider norm bounds for Ep = (I − P˜0 ) · · · (I − P˜p ).57. 0. p do w ← w + RkT A˜−1 k Rk (f − Aw) ⎩ Endfor If an exact solver is used for k = 0.

  We next derive an estimate for Ep V . with v denoting the vector representation of v. The following algebraic relations will hold for Ei defined by (2. Ep v) ≤ δ 2 A(v.. Ep v) ≤ δ 2 A(u.59. A) ≡ −1 ≤ . v) − A (Ep v. the desired results follow. The minimum eigenvalue of M −1 A will satisfy: 1 − δ 2 ≤ λmin (M −1 A). we may substitute the above into the following Rayleigh quotient. See [XU3. ⎪ ⎪ ⎪ ⎪ . We derive two preliminary results. to obtain: vT AM −1 Av A (v. 3. Since M −1 A = I − ETp Ep . . 2. Employing the definition of Ek and substituting Ek = (I − P˜k )Ek−1 for 0 ≤ k ≤ p yields the first identity. Proof. ⎪ ⎩ Ep ≡ (I − P˜p ) · · · (I − P˜0 ). Lemma 2. Ep v) = . The maximum eigenvalue of M −1 A will satisfy: λmax M −1 A ≤ 1. ∀u ∈ V. We employ the notation: ⎧ ⎪ ⎪ E ≡I ⎪ −1 ⎪ ⎪ ⎪ E ˜ 0 ≡ (I − P0 ) ⎪ ⎨ E1 ≡ (I − P˜1 )(I − P˜0 ) (2.36): Ek−1 − Ek = P˜k Ek−1 .. λmin (M A) 1 − δ2 Proof. for 0 ≤ k ≤ p i I − Ei = k=0 P˜k Ek−1 . The condition number of the preconditioned matrix will satisfy: λmax M −1 A 1 cond(M. T v Av A (v. The assumption that Ep V ≤ δ is equivalent to: A(Ep v. for 0 ≤ i ≤ p.   .36) ⎪ ⎪ . 1.90 2 Schwarz Iterative Algorithms Then the following results will hold. TO10]. The second identity is obtained from by summing up the first identity and collapsing the sum. v). v) Since 0 ≤ A (Ep v. u). .

Then. 2.60. Let the parameters ω1 .5 Theoretical Results 91 Lemma 2. K0 and K1 be as defined earlier. the following bound will hold:  p . for v ∈ V .

Consider identity Ek−1 v − Ek v = P˜k Ek−1 v from Lemma 2. v2V − Ep v2V ≥ (2 − ω1 ) A P˜j Ej−1 v..) inner products of both sides with Ek−1 v + Ek v. and simplify: . . j=0 Proof. Ej−1 v . take A(.59.

.

Ek v . Ek−1 v2V − Ek v2V = A P˜k Ek−1 v. Ek−1 v + A P˜k Ek−1 v.

.

(I − P˜k )Ek−1 v . = A P˜k Ek−1 v. Ek−1 v + A P˜k Ek−1 v.

.

. By Lemma 2. the map P˜k is symmetric and positive semidefinite in the A(.41.) inner product and satisfies: . P˜k Ek−1 v . = 2A P˜k Ek−1 v. Ek−1 v − A P˜k Ek−1 v. .

.

Ek−1 v . Substituting this yields: . P˜k Ek−1 v ≤ ω1 A P˜k Ek−1 v. A P˜k Ek−1 v.

.

P˜k Ek−1 v . Ek−1 v − A P˜k Ek−1 v. Ek−1 v2V − Ek v2V = 2A P˜k Ek−1 v.

≥ (2 − ω1 )A P˜k Ek−1 v. · · · . p and collapsing the sum yields the desired result:  p . Ek−1 v . Summing for k = 0.

Proof. Expand P˜ v and substitute v = Ei−1 v + (I − Ei−1 )v to obtain: . Let parameters ω1 .61. k=0 See [XU3.37) K0 (1 + K1 )2 for the error propagation map Ep of the multiplicative Schwarz method. Ek−1 v . Theorem 2. the following bound will hold:   2 − ω1 Ep v2V ≤ 1 − v2V (2. K0 and K1 be as defined earlier.   We are now able to derive the main result on norm bounds for Ep . v2V − Ep v2V ≥ (2 − ω1 ) A P˜k Ek−1 v. TO10] for additional details. Then for v ∈ V .

 .

p A P˜ v. v = i=0 A P˜i v. v p .

 .

Ei−1 v + i=0 A P˜i v. (I − Ei−1 )v p . p = i=0 A P˜i v.

 i .

p = i=0 A P˜i v. . P˜k Ek−1 v . Ei−1 v + i=0 k=1 A P˜i v.

the Cauchy-Schwartz inequality may be generalized to yield: . ·). By Lemma 2. the mappings P˜i are symmetric and positive semidefinite in A(·. Conse- quently.59.92 2 Schwarz Iterative Algorithms The last line was obtained by an application of Lemma 2.41.

.

1/2 .

Ei−1 v ≤ A P˜i v. v A P˜i Ei−1 v. Ei−1 v . Summing the above for i = 0. p yields: p . 1/2 A P˜i v. · · · .

 .

1/2 .

1/2 ˜ p ˜ i=0 A Pi v. Ei−1 v ≤ i=0 A Pi v. v A P˜i Ei−1 v. Ei−1 v .

 1/2 .

v) i=0 A(Pi Ei−1 v. 1/2 p ˜ p ˜ ≤ i=0 A(Pi v. Ei−1 v) .

1/2 .

v i=0 A(P˜i Ei−1 v. Applying the definition of K1 yields: p i . 1/2 p = A P˜ v. Ei−1 v) .

P˜k Ek−1 )v P i=0 k=1 . A ˜i v.

 1/2 .

Ek−1 v) . v) k=0 A(Pk Ek−1 )v. 1/2 p ˜ p ˜ ≤ K1 i=0 A(Pi v.

1/2 .

Combining both these results yields: . 1/2 p = K1 A P˜ v. v ˜ k=0 A(Pk Ek−1 )v. Ek−1 v) .

.

1/2 .

 1/2 p A P˜ v. v ≤ A P˜ v. Ei−1 v) . v ˜ i=0 A(Pi Ei−1 v.

1/2 .

Ek−1 v) . v ˜ k=0 A(Pk Ek−1 )v. 1/2 p + K1 A P˜ v.

1/2 .

 1/2 p = (1 + K1 )A P˜ v. v k=0 A( ˜k Ek−1 )v. Canceling common terms yields: . Ek−1 v) P .

1/2 .

v ≤ (1 + K1 ) A(P˜k Ek−1 )v. Ek−1 v) . 1/2 p A P˜ v.

v ≤ (1 + K1 )2 k=0 A(P˜k Ek−1 )v. k=0 p A P˜ v. Applying Lemma 2.60 yields: . Ek−1 v).

2 − ω1 Finally. applying the lower bound for the eigenvalue of P˜ yields: . v ≤ v2V − Ep v2V . (1 + K )2 1 A P˜ v.

(1 + K )2 K0−1 v2V ≤ A P˜ v. TO10] for additional details. K0 (1 + K1 )2 See [XU3. 2 − ω1 This immediately yields the desired inequality:   2 − ω1 Ep v2V ≤ 1 − v2V .   . v ≤ 1 v2V − Ep v2V .

To ensure convergence of multiplicative Schwarz iterates.5 Theoretical Results 93 Remark 2. the parameter ω1 must satisfy ω1 < 2.37) for Ep V imposes restrictions on the choice of inexact solvers.62. We will . 2. The bound (2.

h0 and a(. So our efforts will focus primarily on estimating how K0 depends on h. equipped with the A(. yielding Ep V = 0. We will show that K1 is independent of h. v ∈ H01 (Ω)  F (v) ≡ Ω f v dx. the mul- tiplicative Schwarz algorithm based on exact solvers will converge in one iteration.28) of elliptic equation (2. Remark 2. for u. for v ∈ H01 (Ω).12) will have the form: q  A(u. Sq of Ω which form a nonoverlapping decomposition: a(x) = ak > 0. henceforth assume that inexact solvers A˜k are suitably scaled so that λmax A˜−1 k Ak = ω 1 < 2. Since the rate of convergence of the multiplicative. 4 which is not optimal. The notation |a| will denote the variation in a(x): maxk ak |a| ≡ . for 1 ≤ k ≤ q. We next state our assumptions on the overlapping subdomains {Ωi∗ }pi=1 . suppose V0 . theoretical estimates yield K0 = K1 = 1 and ω0 = ω1 = 1 so that: * 3 Ep V ≤ . additive and hybrid Schwarz algorithms depend only on the parameters K0 and K1 .63. .) in weak formulation (2. . v) ≡ i=1 ai Si ∇u · ∇v dx. We shall assume that c(x) ≡ 0 and that exact solvers are employed in all projections. for x ∈ Sk . We shall make several simplifying assumptions and estimate the dependence of the convergence rate on the underlying mesh size h.).37) for EP V is not optimal.) and β for the finite element local spaces Vi and forms Ai (. Then. Readers are referred to [XU3. · · · . h0 and a(.).3 Applications to Finite Element Discretizations We shall now apply the preceding abstract Schwarz convergence theory to analyze the convergence of overlapping Schwarz algorithms for solving the fi- nite element discretization (2.).) is piecewise constant on subregions S1 . subdomain size h0 . a(. minl al For the preceding choice of coefficients. we shall estimate how these pa- rameters depend on h.13) of (2. so that A˜k = Ak for 0 ≤ k ≤ p and ω0 = ω1 = 1.12) with BD = ∂Ω.) and F (. TO10] for additional details.. . h0 . However. The bound (2. · · · . Indeed. . the terms A(..).5. Vp are mutually orthogonal subspaces which form an orthogonal de- composition of V .)-inner product. 2.. Assumption 1. overlap factor β h0 and the variation in the coefficient a(. We assume that the coefficient a(.

piecewise linear finite element functions defined on Th (Ω). We assume that the overlapping subdomains {Ωi∗ }pi=1 are constructed from a non-overlapping decomposition {Ωi }pi=1 . Assumption 5. The Hilbert space V ≡ Vh ∩ H01 (Ω). we let π0 denote a traditional interpolation map onto V0 . for 1 ≤ i ≤ p ⎪ ⎪ ⎨ χ (x) = 0. h0 ) > 0 and c2 (Q0 . We let Vh denote the space of continuous. . We assume the following about the triangulation of Ω and the subdomains. for v ∈ C(Ω) ∩ H 1 (Ω). The following properties will be assumed about these operators. h0 and operator Q0 . (2. where each sub- domain Ωi∗ is an extension of Ωi of diameter h0 . we consider several operators which map onto this subspace. Given Ω1∗ . for 1 ≤ i ≤ p. We associate a p × p adjacency matrix G with the subdomains {Ωi∗ }pi=1 .38) 0. and when applicable. where 0 ≤ β denotes an overlap parameter. and I0 a weighted interpolation map onto V0 .64. · · · . Ωi ) < β h0 } . 1 ≤ i ≤ p i (2. Let the L2 (Ω)-orthogonal projection Q0 onto V0 satisfy: |Q0 v|2H 1 (Ω) ≤ c1 (Q0 . If a coarse space V0 is employed. {Ωi }pi=1 and {Ωi∗ }pi=1 . h. We assume a quasiuniform triangulation Th (Ω) of Ω.40) where c1 (Q0 . it will assumed to satisfy V0 ⊂ Vh ∩ H01 (Ω). Ωp∗ . Assumption 3.94 2 Schwarz Iterative Algorithms Assumption 2. for x ∈ Ω ⎪ ⎩ ∇χi L∞ (Ω) ≤ β h−1 0 . Definition 2. we define its adjacency matrix G by: ⎛ ⎞   1. for v ∈ C(Ω) ∩ H 1 (Ω) v − Q0 v2L2 (Ω) ≤ c2 (Q0 . for x ∈ Ω\Ωi∗ . We assume there exists a smooth partition of unity {χi (x)}pi=1 subordinate to the cover {Ωi∗ }pi=1 satisfying the following con- ditions: ⎧ ⎪ 0 ≤ χi (x) ≤ 1. h0 ) > 0 denote parameters which may depend on h. while subspaces Vi for 1 ≤ i ≤ p are defined as Vi ≡ Vh ∩ H01 (Ωi∗ ). If a coarse space V0 is employed.39) ⎪ ⎪ χi (x) + · · · + χp (x) = 1. if Ωi∗ ∩ Ωj∗ = ∅ Gij = and g0 ≡ max ⎝ Gij ⎠. h. h. We let Q0 denote the L2 (Ω)-orthogonal projection onto V0 . with overlap β h0 : Ωi∗ ≡ Ωiβ h0 ≡ {x ∈ Ω : dist(x. 1 ≤ i ≤ p. but not on the coefficients {al }. (2. We will employ a partition of unity satisfying the following assumptions. h. Assumption 4. if Ωi∗ ∩ Ωj∗ = ∅ i j=i where g0 denotes the maximum number of neighbors intersecting a subdomain. h0 ) h20 |v|2H 1 (Ω) . h0 ) |v|2H 1 (Ω) . whose elements align with the subdomains {Si }qi=1 .

DR11. DR11].41) where c1 (π0 .44) where c is independent of h. The L2 (Ω)-orthogonal projection Q0 onto V0 will satisfy: |Q0 v|2H 1 (Ω) ≤ c |v|2H 1 (Ω) . h0 ) denote parameters which may depend on h. h. a(. The L2 (Ω)-orthogonal projection Q0 will typically be global. If as in multigrid methods. for Ω ⊂ IR2 2 ⎪ |π0 v|2H 1 (Ωi ) ≤ c (1 + (h0 /h)) |v|2H 1 (Ωi ) . for Ω ⊂ IR3 ⎪ ⎩ v − π0 v2L2 (Ωi ) ≤ c h20 |v|2H 1 (Ωi ) . For such a coarse space. for v ∈ H 1 (Ω) (2. h0 . where c1 (I0 . (2.).). Remark 2. for v ∈ H 1 (Ω) where c is independent of h. (2. h0 ) and c2 (I0 . h.65. 2 Remark 2.43) v − Q0 v2L2 (Ω) ≤ c h20 |v|2H 1 (Ω) . h0 ) and ci (I0 . h0 and I0 but not on the coefficients {al }. we assume Sl = Ωl for 1 ≤ l ≤ p with p = q. h. for v ∈ C(Ω i ) ∩ H 1 (Ωi ).42) v − I0 v2L2 (Ωi ) ≤ c2 (I0 . h. as noted in the following. piecewise linear finite element functions on Th0 (Ω). h0 ) j:Gij =0 d2ij |v|2H 1 (Ωj ) . see [BR22. h0 ) h20 |v|H 1 (Ωi ) . in the sense that (Q0 w)(x) for x ∈ Ωj may depend on w(·) in Ω\Ωj . h. h0 ) and c2 (π0 . interpolation map π0 is required to be local on the subregions Ωj . h. h0 ). for v ∈ C(Ω i ) ∩ H 1 (Ωi ) v − π0 v2L2 (Ωi ) ≤ c2 (π0 . ci (π0 . see [CI2. The weights dij ≥ 0 depend on aj the coefficients {al } and satisfy dij ≤ ai +a j . h0 ) |v|2H 1 (Ωi ) . h0 ) denote parameters which can depend on h. BR21]. 3. then a coarse subspace V0 ⊂ Vh can be defined as the continuous. h. h. XU3. a(. for Ω ⊂ IRd .  (2. . but not on the coefficients {al }. so that ai d ij /aj ≤ 1. h. h. h0 ) in assumption 5. the triangulation Th (Ω) is obtained by the refinement of some coarse quasiuniform triangulation Th0 (Ω) whose elements {Ωi }pi=1 have diameter h0 . The standard nodal interpolation map π0 onto V0 will satisfy the following bounds on each element Ωi of Ω for v ∈ C(Ω) ∩ H 1 (Ω): ⎧ ⎨ |π0 v|H 1 (Ωi ) ≤ c (1 + log(h0 /h)) |v|2H 1 (Ωi ) . h0 . We assume that I0 : C(Ω) ∩ H 1 (Ω) → V0 satisfies the following bound on each subdomain Ωi for v ∈ H 1 (Ω):  |I0 v|2H 1 (Ωi ) ≤ c1 (I0 .5 Theoretical Results 95 When applicable. explicit bounds are known for ci (Q0 . since (π0 w)(x) for x ∈ Ωj depends only on the values of w(·) in Ωj .66. h. BR21. JO2. h0 ) h20 j:Gij =0 d2ij |v|H 1 (Ωj ) . d = 2. h0 and π0 . If a weighted interpolation map can be defined. 2. In contrast. we assume that π0 : C(Ω) ∩ H 1 (Ω) → V0 (the traditional interpolation map) satisfies the following local bounds on each Ωi : |π0 v|2H 1 (Ωi ) ≤ c1 (π0 .

Estimation of K1 Lemma 2. WA6]. ∀v ∈ Vh . see also Chap. the following will hold for the subspaces Vi defined as Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p: 1. In particular. .46) See [ST14. MA17] applies to general grids and yields robust convergence. (2. Remark 2. the piecewise constant coarse space [CO8. Assumption 6. aj where c is independent of h and h0 (and a(x)). as in (2. In applications. CA18]. if V0 is employed K1 ≤ ω1 g0 . We assume that the following inverse inequality holds with a parameter c (independent of h) such that on each element κ ∈ Ωh |v|H 1 (κ) ≤ C h−1 vL2 (κ) .38). We refer the reader to [CO8.67. Let g0 denote the maximum number of neighboring subdomains which intersects a subdomain. MA17. Then. DR10.35) will satisfy: l0 ≤ g0 . SA7.9. JO2]. 2. SA7. MA17. 2  (2. 3. see [WI6.96 2 Schwarz Iterative Algorithms A piecewise constant weighted interpolation map I0 onto V0 can be defined satisfying the following bounds on each element Ωi of Ω for v ∈ C(Ω)∩H 1 (Ω):  |I0 v|2H 1 (Ωi ) ≤ c 1 + log2 (h0 /h) j:Gij =0 dij |v|H 1 (Ωj ) .45) v − I0 v2L2 (Ωi ) ≤ c h20 j:Gij =0 dij |v|H 1 (Ωj ) . The parameter K1 will satisfy: ω1 (g0 + 1). alternative coarse spaces may be employed. GI3. if V0 is not employed.68. CI2. The parameter l0 defined by (2. and dij ≤ ai +aj .

Similarly. . parameter ij = 1 for 1 ≤ i. . An application of Lemma 2. Consider the matrix E = (ij )i. V1 . j ≤ p. . Gij = 0 will yield ij = 0 for 1 ≤ i. The following observa- tion relates the entries of E to the entries of the following matrix G: Gij = 0 =⇒ Ωi∗ ∩ Ωj∗ = ∅ =⇒ H01 (Ωi∗ ) ⊥ H01 (Ωj∗ ). j ≤ p. when Gij = 1. p Proof. where ω1 = maxi λmax A˜−1i Ai .j=0 of strengthened Cauchy-Schwartz parameters associated with subspaces V0 .51 now yields the desired result. representing subdomain adjacencies. Vp . Thus. .   .

Proof. The next preliminary result will be employed later in this section in estimating the parameter C0 . x ∈ κ.69. and the rate of convergence of a traditional two-level overlapping Schwarz algorithm depends primarily only on the parameter K0 (or equivalently C0 ). h0 and a(. or equivalently the partition parameter C0 (since we assume ω0 = ω1 = 1) for different Schwarz algorithms. Application of the triangle and arithmetic-geometric mean inequality yields: 2 2 2 |wi |H 1 (κ) ≤ 2 |πh χi (xκ )w|H 1 (κ) + 2 |πh (χi (·) − χi (xκ )) w|H 1 (κ) . 2.). Given w ∈ Vh ∩ H01 (Ω) define wi ≡ πh χi w ∈ Vi for 1 ≤ i ≤ p. 1. |a|. with or without a coarse space. Thus K1 is typically independent of these parameters. h0 . For a typical overlapping decomposition {Ωi∗ }pi=1 of Ω and for sufficiently small β. By construction w1 + · · · + wp = πh (χ1 + · · · + χp ) w = πh w = w. 2.5 Theoretical Results 97 Remark 2. we shall estimate the parameter K0 . 2 2 . We obtain w = w1 + · · · + wp .70. Then. and β. We express: wi (x) = πh χi (x)w(x). 2. Let the assumptions 1 through 6 hold. In the following. C will denote a generic constant independent of h. the the number g0 of adjacent subdomains is independent of h. Estimation of K0 Lemma 2. For each 1 ≤ i ≤ p and 1 ≤ j ≤ q the following bound will hold:   ! aj |∇wi | dx ≤ 2aj 2 |∇w| dx + Cβ 2 −2 h−2 0 w2L2 (Sj ) . 1. β and |a|. h0 . Substituting πh χi (xκ )w = χi (xκ )w on κ and the inverse inequality yields: |wi |H 1 (κ) ≤ 2χi (xκ )2 |w|H 1 (κ) + Ch−2 |πh (χi (·) − χi (xκ ))w|L2 (κ) 2 2 2 ≤ 2 |w|H 1 (κ) + 2Ch−2 |πh (χi (·) − χi (xκ )) w|L2 (κ) . whose value may differ from one line to the next. Consider an element κ ∈ Sj and let xκ be its geometric centroid. x∈κ = Ih χi (xκ )w(x) + πh (χi (x) − χi (xκ )) w(x). the following results will hold. Let Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p be local finite element spaces. 3. For convenience. Suppose the following conditions hold. Sj Sj where C > 0 is independent of h. with some abuse of notation.

72. 2 Sj Sj Since the terms on the left hand side above are zero when Sj ∩ Ωi∗ = 0. Without loss of generality. If m0 denotes the maximum number of subdomains Ωj∗ intersecting a subregion Si .70. By Taylor expansion.71.47) ≤ 2g0 1 + Cβ −2 h−2 0 ||a| A(w. Substituting the above in the expression preceding it yields: ⎧ 2 2 −2 ⎨ |wi |H 1 (κ) ≤ 2 |w|H 1 (κ) + Ch πh (χi (·) − χi (xκ )) wL2 (κ) 2 ⎪ ≤ 2 |w|H 1 (κ) + 2Ch−2 β −2 h−2 2 0 h wL2 (κ) 2 2 ⎪ ⎩ = 2 |w|H 1 (κ) + 2Cβ −2 h−2 2 0 wL2 (κ) . we only need sum the above for i such that Gij = 0 to obtain:   ! p aj |∇wi |2 dx ≤ 2g0 aj |∇w|2 dx + Cβ −2 h−2 0 aj wL2 (Sj ) . Suppose the following conditions hold. we employed that 0 ≤ χi (xκ ) ≤ 1.   Remark 2. we may assume that the subregions {Si }qi=1 are obtained by refinement of {Ωj }pj=1 (if needed by intersecting the Si with Ωj ). |a| and β.70 to obtain:   aj |∇wi |2 dx ≤ 2aj |∇w|2 dx + 2Caj β −2 h−2 0 wL2 (Ωj ) . 1. By construction w1 + · · · + wp = w. we estimate C0 when a coarse space V0 is not employed. then it immediately follows that m0 ≤ g0 where g0 denotes the maximum number of overlapping subdomains intersecting any Ωi∗ . Given w ∈ Vh ∩ H01 (Ω) define wi ≡ πh χi w ∈ Vi for 1 ≤ i ≤ p. xκ ). yielding that parameter C0 ≤ 2g0 1 + Cβ −2 h−2 0 ||a| . Proof. β. 2 i=1 Sj Sj . Our estimate will be based on Lemma 2. for C independent of h. Let the assumptions 1 through 6 hold. |a| and h0 . h0 . we obtain: |χi (x) − χi (xκ )| = |∇χi (˜ x) · (x − xκ )| ≤ Cβ −1 h−1 0 h. 3. the decomposition will satisfy: p −2 −2 q j=1 A(wi . wi ) ≤ 2g0 A(w. Let Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p. Then. 2 Here C is a generic constant independent of h.98 2 Schwarz Iterative Algorithms Here. Summing over all the elements κ ∈ Sj and multiplying both sides by aj yields the result. In the following result. w). for some point x ˜ on the line segment (x. Lemma 2. 2. w) + 2g0 Cβ j=1 aj wL2 (Sj ) 2 h0 (2. Apply Lemma 2.

q yields: p p q  i=1 A(wi .5 Theoretical Results 99 Summing the above for j = 1. · · · . wi ) = j=1 aj Sj |∇wi | dx 2 i=1 q . 2.

but not optimal with respect to coefficient variation |a|. we obtain: q  2 A(Q0 v. Q0 v) = j=1 aj Sj |∇Q0 v| dx ≤ a∞ |Q0 v|2H 1 (Ω) ≤ a∞ c1 (Q0 . 3. Theorem 2. 2. Suppose the following conditions hold. where C is independent of h.73. h0 )β −2 |a| . w) + 2g0 Cβ −2 h−2 0 a∞ a −1 ∞ A(w. By construction. it is easily verified that v0 + v1 + · · · + vp = v. h0 . . h. |a| and β and ci (Q0 . the following will hold for v0 .   The preceding bound for C0 deteriorates as h0 → 0. These bounds. . w) + 2g0 Cβ −2 h−2 0 a∞ wL2 (Ω) 2 ≤ 2g0 A(w. 2. whose value may differ from one line to the next. v1 . . vp :  p A(vi .40). Given v ∈ Vh ∩ H01 (Ω) define v0 = Q0 v and vi ≡ πh χi (v − v0 ). . Then. h. Proof. Let V0 ⊂ Vh ∩ H01 (Ω) be a coarse space for which Q0 satisfies (2. Here. h. h)|v|2H 1 (Ω) q  ≤ c1 (Q0 . With abuse of notation. C denotes a generic constant.40). as it enables transfer of some information globally each iteration. we employed Poincar´e-Freidrich’s inequality to bound w2L2 (Ω) in terms of |w|2H 1 (Ω) . h)a∞ a−1 ∞ j=1 aj Sj |∇v| dx 2 = c1 (Q0 . vi ) ≤ C0 A(v. h0 . i=0 with C0 ≤ C (g0 +1) 1 + c1 (Q0 . derived using the projection Q0 . w). h0 )|a| + c2 (Q0 . Let assumptions 1 to 6 hold with Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p. h0 ) has known dependence on h and h0 for i = 1. This deterioration is observed in Schwarz algorithms in which information is only exchanged between adjacent subdomains each iteration. h. see equation (2. are independent of h0 . w) = 2g0 (1 + Cβ −2 h−2 0 |a|)A(w. ≤ 2g0 j=1 aj Sj |∇w|2 dx + Cβ −2 h−2 0 wL2 (Sj ) 2 ≤ 2g0 A(w.40) in assumption 5. w) + 2g0 Cβ −2 h−2 0 a∞ |w|H 1 (Ω) 2 ≤ 2g0 A(w. Since the projection Q0 satisfies (2. . h0 . v). 1. The following result estimates C0 when a coarse subspace V0 is employed. v). h0 ) |a| A(v. Inclusion of a coarse space can remedy such deterioration.

40) from assumption 5.100 2 Schwarz Iterative Algorithms Here.47) from Lemma 2. Now apply equation (2.72 using w = v −v0 and also using wi ≡ vi = πh χi w to obtain: p . we used equation (2.

vi ) ≤ 2g0 A(w. q −2 −2 j=1 A(vi . w) + Cβ j=1 aj wL2 (Sj ) 2 h0 .

≤ 2g0 A(w. w) + Cβ −2 h−2 0 a∞ v − Q0 vL2 (Ω) 2 .

. · · · . . h0 )|a|+c2 (Q0 . h. h0 ) β −2 a∞ a−1 ∞ A(v. w) ≤ 2 (1 + c1 (Q0 . Suppose the following assumptions hold. Let p = q and Sj = Ωj for 1 ≤ j ≤ p. v). ≤ 2g0 A(w. Substituting the above and combining the sums for i = 0. The next result considers alternative bounds for C0 when |a| is large. Theorem 2. then c1 (Q0 . i=0 where C is a generic constant independent of h. vi ) ≤ (g0 +1)C(1+c1 (Q0 . h0 . Let π0 satisfy equation (2.74. w) + Cc2 (Q0 .40) in assumption 5. applying the triangle inequality yields: A(w. where C is independent of h.43). . h0 . see equation (2. piecewise linear finite element functions defined on a coarse triangulation Th0 (Ω) of Ω from which Th (Ω) is obtained by successive refinement. . |a| and β.. v) = 2g0 A(w. w) + Cc2 (Q0 . h0 .   Remark 2. . h0 ) β −2 h−2 0 a∞ h 2 0 |v|2 1 H (Ω) ≤ 2g0 A(w. Let V0 ⊂ Vh ∩ H01 (Ω) be a coarse space. Let assumptions 1 to 6 hold with Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p. When V0 is the traditional coarse space of continuous. 4. For v ∈ Vh ∩ H01 (Ω). 2. β and |a|. h.) and c2 (Q0 . h0 )β −2 |a|)A(v.) are independent of h. yielding: C0 ≤ C (g0 + 1)|a| 1 + β −2 . h.41).75. p yields:  p A(vi . h.. 5. h0 ) |a|) A(v. 1. |a| and β. h. h0 . 3. Since w = v − v0 . This result shows that a Schwarz algorithm employing traditional coarse space residual correction is robust when the variation |a| in the coefficients is not large. |a| and β. v). h. h0 ) β −2 |a|A(v. v) . while c2 (Q0 . h0 ) was used from equation (2. define v0 = π0 v and vi ≡ πh χi (v − v0 ). w) + C c2 (Q0 . h. where C is a generic constant independent of h.

Apply equation (2. |a| and β and ci (π0 . h0 ) j=1 aj |v|2H 1 (Sj ) ⎪ ⎩ = c1 (π0 . h. By construction v0 + · · · + vp = v. 2. Proof. Apply Lemma 2. h.72 with w = v − v0 and wi ≡ vi = πh χi w. i=0 where C is independent of h. h0 ) β −2 A(v. h. v). vi ) ≤ C (g0 + 1) c1 (π0 . h0 ) + c2 (π0 .47): p . h0 .5 Theoretical Results 101 Then. h0 ) are defined in equation (2.41) of assumption 5. yielding (2. v). h. π0 v) = j=1 aj Sj  |∇π0 v| dx q ≤ c1 (π0 . the following estimate will hold:  p A(vi .41) to obtain: ⎧ q  2 ⎪ ⎨ A(π0 v. h0 ) A(v. h.

w) + Cβ j=1 aj wL2 (Sj ) 2 h0 . q −2 −2 j=1 A(vi . vi ) ≤ 2g0 A(w.

q = 2g0 A(w. w) + Cβ −2 h−2 0 j=1 aj v − π0 vL2 (Sj ) 2 .

h.42) is used onto V0 .. if Ω ⊂ IR3 . w) + Cβ −2 h−2 2 0 c2 (π0 . . h0 ) h0 j=1 aj |vH 1 (Sj ) 2 = 2g0 A(w. This result indicates that Schwarz algorithms employing traditional coarse spaces have reasonably robust theoretical bounds independent of |a|. h. h0 ) + c2 (π0 . Improved bounds result if I0 -interpolation (2. h0 ) is defined in equation (2. if Ω ⊂ IR2 C0 ≤ C (g0 + 1) (h0 /h) + β −2 . then bounds for c1 (π0 .76. the triangle inequality yields: A(w. h. h. Since w = v − v0 . i=0 where C is independent of h. v). |a| and β. . q ≤ 2g0 A(w. Substituting this and combining the terms yields:  p A(vi . .44) yields: C (g0 + 1) log(h0 /h) + β −2 . computational tests indicate almost optimal convergence in both two and three dimensions. h0 ) β −2 A(v.) in equation (2. When V0 is a traditional finite element coarse space defined on a coarse triangulation Th0 (Ω) of Ω. vi ) ≤ C(g0 + 1) c1 (π0 . . h. v). |a| and β. v) where c2 (π0 . w) ≤ 2 (1 + c1 (π0 . h0 )) A(v. h0 )β −2 A(v. whose successive refinement yields Th (Ω).. While these bounds deteriorate in three dimensions. h0 . w) + C c2 (π0 . h0 .41) and C is independent of h. .   Remark 2. h.) and c2 (π0 .

Let I0 satisfy equation (2. 4. 5.42) of assumption 5.102 2 Schwarz Iterative Algorithms Theorem 2. I0 v) = i=1 ai |∇I0 v| dx Ωi p  d2 ≤ c1 (I0 . Proof. the following estimate will hold:  p A(vi . Let p = q and Sj = Ωj for 1 ≤ j ≤ p. h. while ci (I0 . Let assumptions 1 to 6 hold with Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p.77.42). Then. h0 ) is defined in equation (2. h.47) from Lemma 2. |a| and β.42) to obtain: p  2 A(I0 v. For v ∈ Vh ∩ H01 (Ω). Let V0 ⊂ Vh ∩ H01 (Ω) be a coarse space. h0 ) β −2 A(v. h0 ) + c2 (I0 . h0 ) i=1 ai j:Gij =0 aijj aj |v|2H 1 (Ωj ) p  ai d2 ≤ c1 (I0 .72 with w = v − v0 and wl ≡ vl = πh χl w: p . 1. vi ) ≤ C (g0 + 1) c1 (I0 . h0 ) A(v. v). 3. 2. h0 . i=0 where C is independent of h. h. define v0 = I0 v and vi ≡ πh χi (v − v0 ). Suppose the following assumptions hold. Apply (2. Apply equation (2. h. h0 ) i=1 j:Gij =0 ajij aj |v|2H 1 (Ωj ) ≤ g0 c1 (I0 . h. By construction v0 + · · · + vp = v. h. v).

w) + C β i=1 ai wL2 (Ωi ) 2 h0 . p −2 −2 l=1 A(vl . vl ) ≤ 2g0 A(w.

w) + C β −2 h−2 0 i=1 ia v − I 0 v 2 2 . p = 2g0 A(w.

h0 )) A(v. w) ≤ 2 (1 + c1 (I0 . h0 ) g0 β −2 A(v. Since w = v − v0 . where C is independent of h. p  L (Ωi ) −2 −2 ai d2 ≤ 2g0 A(w. h. |a| and β. w) + Cg0 c2 (I0 . h0 . i=0 where C is independent of h. h. Substituting this and combining the terms yields:  p A(vi . v). |a| and β. w) + Cβ h0 c2 (I0 . v) . h0 )h0 i=1 j:Gij =0 ajij aj |v2H 1 (Ωj ) 2 = 2g0 A(w.   . applying the triangle inequality yields: A(w. h0 ) β −2 A(v. h. v). h0 ) + c2 (I0 . h0 . h. h. vi ) ≤ C (g0 + 1) c1 (I0 .

• Non-overlapping subdomains {Ωi }pi=1 can be chosen as strips of the form: Ωi ≡ {(x1 . if Ω ⊂ IRd . x2 ) : bi < x1 < bi+1 } ∩ Ω. and weakly coupled along the x1 axis. Ωp . In the limiting case of  = 0.42) will satisfy: C0 ≤ C (g0 + 1) log2 (h0 /h) + β −2 . .78. Anisotropic Problems We next outline estimates for Schwarz algorithms applied to solve anisotropic elliptic equations. To obtain strips of width h0 .e. MA17. (2.49) for some choice of bi . on ∂Ω. . Thus. efficient direct solvers (such as band solvers) may be available for solution of the strip problems. • Extended subdomains {Ωi∗ }pi=1 can be constructed from the strips {Ωi }pi=1 using an overlap factor of β h0 for some 0 < β < 1/2. ensure that: |bi+1 − bi | = O(h0 ).48) u = 0.. . yielding a matrix with small bandsize. for d = 2.) and c2 (I0 . where Ω ⊂ IR2 and 0 <   1 is a small perturbation parameter. The weak coupling along the x1 axis suggests several heuristic choices in the formulation of the Schwarz iterative algorithm. . the elliptic equation will not be coupled along the x1 axis. WA6]. When V0 is the piecewise constant coarse space defined on the subdomain decomposition Ω1 . • If h0 is sufficiently small. see [CO8. If such layers need to be resolved computationally. then refinement of the grid may be necessary in such subregions. . 3. . there may be subregions of Ω on which the solution has large gradients. then a coarse space V0 may not be required to ensure robust convergence. SA7. Sharper estimates with respect to overlap β are obtained in [DR17].5 Theoretical Results 103 Remark 2. 2. then bounds for c1 (I0 .. in Ω (2. provided the discrete unknowns within each strip are ordered horizontally. We consider the following model anisotropic problem: −ux1 x1 − ux2 x2 + u = f. . i. √ • If the overlap factor is chosen so that β h0 ≥ c . for 1 ≤ i ≤ p.) in equation (2. For 0 <   1.. the solution may exhibit boundary layer behavior near ∂Ω. Schwarz algorithms employ- ing the piecewise constant coarse space will have almost optimal convergence bounds in both two and three dimensions. the preceding elliptic equation will be strongly coupled along the x2 axis. row by row. . Due to presence of the small parameter . .

for sufficiently small β. The proof involving a finite element discretization can be obtained by appropriate modification of the proof given below.49) with width h0 .  and β. h0 . Such a partition of unity will not satisfy χi (x) = 0 for x ∈ ∂Ω. Lemma 2. Consider a finite element discretization of elliptic equation (2. We outline the proof only in the continuous case. Parameter K1 will satisfy K1 ≤ g0 . since ω0 = ω1 = 1) will satisfy: K0 ≤ C g0 1 + β −2 h−2 0 . given the strip subdomains. provided that in the general case the subdomains be chosen as cylinders or strips whose sections are perpendicular to the axis of weak coupling of the elliptic equation. this will not alter the construction of wi described below. we obtain: ⎧ ⎪ ⎪ A(wi . 1. Employ a Schwarz algorithm based on subspaces Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p. Ωp∗ .48) based on a finite element space Vh ∩ H01 (Ω). Let g0 denote the maximum number of adjacent overlapping subdomains.. To esti- mate K0 . Parameter K0 (equivalently C0 . wi ) =  ∂w ∂x1 L2 (Ωi∗ ) +  ∂x2 L2 (Ωi∗ ) + wi L2 (Ωi∗ ) i 2 ∂wi 2 2 ⎪ ⎨ . we shall employ a partition of unity χ1 (x).e. However. 1 ≤ i ≤ p. Given such a partition of unity and w ∈ H01 (Ω) define wi ≡ χi w. · · · . 2. Extend each Ωi to Ωi∗ to have overlap β h0 where β < 1/2. 1.79. Then.68 yields K1 ≤ g0 . Proof. · · · . Applying Lemma 2. Furthermore:     ∂wi ∂χi ∂w ∂wi ∂w = w + χi and = χi . without a coarse space V0 . such that χi (x) = χi (x1 ).104 2 Schwarz Iterative Algorithms These ideas may be extended to more general anisotropic problems in two or three dimensions. Then the following will hold. and use exact local solvers. We now estimate the convergence rate of Schwarz iterative algorithms applied to anisotropic problem (2. We further require the smoothness assumption: + + + ∂χi + + + −1 −1 + ∂x1 + ≤ Cβ h0 . Choose subdomains Ωi for 1 ≤ i ≤ p of the form (2. i.48). ∂x1 ∂x1 ∂x1 ∂x2 ∂x2 Employing arguments analogous to the isotropic case. 2. χp (x) subordinate to the strip subdomains Ω1∗ . for C independent of h. by construction (w1 + · · · + wp ) = (χ1 + · · · + χp ) w = w. each partition of unity function is solely a function of the variable x1 . 3. since the partition of unity functions will multiply functions which are in H01 (Ω).

≤ C β −2 h−2 0 w 2 2 ∗ +  ∂w 2  2 ∗ +  ∂w 2  2 ∗ + w2 2 ∗ ⎪ ⎪ L (Ωi ) .

∂x1 L (Ωi ) ∂x2 L (Ωi ) L (Ω ) i ⎩ = C 1 + β −2 h−2  ∂w 2 ⎪ ∂x1 L2 (Ω ∗ ) +  ∂x2 L2 (Ω ∗ ) + wL2 (Ω ∗ ) . ∂w 2 2 0 i i i .

We will assume that the parabolic equation has been suitably discretized. T ] u = 0. 0) = u0 (x). Grid refinement may be necessary to resolve such layer regions. KU6. Let ∗ each extended subdomain √ Ωi be constructed by extending Ωi to include overlap of size βh0 ≥ c τ . KU6. CA. . i=1 Thus C0 ≤ Cg0 1 + β −2 h−2 0 .51) u = 0. for C independent of h. on ∂Ω × [0. where Lu ≡ −∇ · (a∇u). CA3]. Time Stepping Problems We conclude this section by considering the Schwarz algorithm for the iterative solution of the linear system arising from the implicit time stepping of a finite element or finite difference discretization of a parabolic equation: ⎧ ⎪ ⎨ ut + Lu = f. h0 and τ . • The Schwarz algorithm based on the subspaces Vi ≡ Vh ∩ H01 (Ωi∗ ) will have optimal order convergence without the use of a coarse space. where C will be independent of h0 and  (and h in the discrete case). then the term 1 + β −2 h−2 0 will be bounded and convergence of Schwarz algorithms will be robust without the inclusion of coarse space correction. CA.50) ⎪ ⎩ u(x. in Ω.80. . 9. . then the elliptic equation resulting from an implicit time stepping of (2. .50) will have the form: (I + τ L) = f˜. see [KU3.51) or its discretizations [KU3. . CA3] and Chap. w). Estimates yield that K1 ≤ g0 and: K0 ≤ Cg0 1 + τ β −2 h−2 0 . The presence of the small parameter 0 < τ  1 enables simplification of Schwarz algorithms to solve (2. • Let Ω1 . in Ω (2. on ∂Ω. The absence of coarse space residual correction can be particularly advan- tageous from the viewpoint of parallelization.5 Theoretical Results 105 Summing over 1 ≤ i ≤ p yields the following bound:  p A(wi . This elliptic equation is singularly perturbed for τ → 0+ and may exhibit boundary layer behavior on subregions. 2. If τ > 0 denotes the time step.  √ Remark 2. in Ω × [0. T ] (2. wi ) ≤ Cg0 1 + β −2 h−2 0 A(w. since coarse spaces requires interprocessor communication. Ωp denote a nonoverlapping decomposition of Ω of size h0 . If the overlap satisfies βh0 ≥ c .

9 describes theoretical estimates for the condition number of various Schur complement preconditioners. referred to as the Schur complement system. the traditional substructuring method in structural engineering.4 describes several preconditioners for two subdomain Schur complement matrices. 3.5 and Chap. PR5]. Chap. These methods iteratively solve the linear systems arising from the discretization of a self adjoint and coercive elliptic equation. In Chap. 3. This reduced system.1 we introduce notations. assembles and solves the Schur complement system using a direct method [PR4.3 Schur Complement and Iterative Substructuring Algorithms In this chapter. This parameterization enables reducing the original elliptic equation to a Steklov- Poincar´e problem for determining the solution on such boundaries. while Chap. Our discussion in this chapter is organized as follows. The Schur complement system and its algebraic proper- ties are described in Chap. while Chap.7 describes the Neumann-Neumann and balancing preconditioners. with the substructuring method.3 describes FFT based fast direct solvers for Schur complement systems on rectangular domains with stripwise constant coefficients. By contrast. In the discrete case. 3. 3.8 dis- cusses implementational issues. 3. Chap. 3. the global solution can be obtained by solving a local boundary value problem on each subdomain. parameterizing the global solution in terms of its Dirichlet values on the subdomain boundaries. 3.2. corresponds to a block Gaussian elimination of the unknowns in the interiors of the subdomains. in parallel.6 describe multi-subdomain preconditioners for Schur complements in two dimensions and three dimensions. we describe multi-subdomain Schur complement and iterative substructuring methods. the solution to an elliptic equation can be parameterized in terms of its unknown Dirichlet values on the subdomain boundaries. and this property enables the formulation of various effective preconditioners. is iteratively solved by a PCG method. which pre-dates domain decomposition methodology. Chap. In the continuous case. Chap. based on a decomposition of its domain into non-overlapping subdomains. Once the reduced problem is solved. The Schur complement matrix is by construction a discrete approximation of the Steklov-Poincar´e operator. to obtain a reduced problem. 3. 3. .

Here BD and BD denote the Dirichlet and Neumann boundary segments. with BD ∪ BN = ∂Ω and BD ∩ BN = ∅.1) ⎩ n · (a∇u) = gN (x). (3. and substituting this into (3. Given a quasiuniform triangulation Th (Ω) of Ω. B[i] the exterior non-Dirichlet segment. . for 1 ≤ i ≤ n. ∀vh ∈ Vh ∩ HD 1 (Ω). A finite element discretization of (3.108 3 Schur Complement and Iterative Substructuring Algorithms 3. ∀u.1) seeks uh ∈ Vh ∩ HD 1 (Ω) satisfying: ⎧ ⎪ ⎪ A(uh . The following notation will be employed for subdomain boundaries. This system will be partitioned into subblocks based on an ordering of the nodes given a decomposition of the domain into non-overlapping subdomains. the standard piecewise linear nodal basis functions {φi (x)}ni=1 dual to these nodes will satisfy: φj (xi ) = δij . Here B (i) denotes the interior and Neumann segment of ∂Ωi . xn .3) A matrix representation of the discretization n(3. on BN .1) if: Ω = ∪pl=1 Ω l and Ωi ∩ Ωj = ∅ when i = j. ∀v ∈ HD 1 (Ω) ⎪ ⎩ 1   HD (Ω ≡ v ∈ H (Ω) : v = 0 on BD . v) ≡  (a ∇u · ∇v + c uv) dx. piecewise linear functions defined on Th (Ω). Definition 3. where a(x) ≥ a0 > 0 and c(x) ≥ 0.2) can be obtained by expand- ing uh relative to this nodal basis uh (y) ≡ i=1 uh (xi ) φi (y). j ≤ n ⎪ (u)i = uh (xi ).2) Let n denote the number of nodes of Th (Ω) in (Ω ∪ BN ). . This results in a linear system: Ah u = f . Ωp forms a non-overlapping de- composition of Ω (see Fig. and B the interface separating the subdomains. . for 1 ≤ i ≤ n ⎪ ⎩ (f )i = F (φi ). vh ) = F (vh ). We enumerate them as x1 . We also let Bij ≡ B (i) ∩ B (j) denote the interface between Ωi and Ωj . . 1 ≤ i. we shall let Vh denote the finite element space of continuous.4) where: ⎧ ⎨ (Ah )ij = A(φi . 1 (3.1 Background We consider the following self adjoint and coercive elliptic equation: ⎧ ⎨ −∇ · (a(x) ∇u) + c(x) u = f (x). for 1 ≤ i. 3. Then. We shall say that Ω1 . . (3. . B ≡ ∪pi=1 B (i) and B (i) ≡ ∂Ωi \BD and B[i] ≡ ∂Ωi ∩ BD for 1 ≤ i ≤ p. . v ∈ HD (Ω) 1 Ω  ⎪ ⎪ F (v) ≡ Ω f v dx + BN gN v dsx . in Ω u=0 on BD . .1. j ≤ n. (3.2) with vh = φj for 1 ≤ j ≤ n. . where ⎪ ⎨ A(u. φj ).

. Ωp and interface B.. (3. nI ). j ≤ nB ⎨ (uI )j = (u)j . We shall assume that the sub- domains are chosen to align with the triangulation Th (Ω). We shall assume that the chosen ordering of nodes satisfies: (1) (i−1) (1) (i) xj ∈ Ωi . Let nB denote the number (1) (p) of nodes on B. (1) (p) where nI ≡ (nI + · · · + nI ) denotes the total number of nodes in subdomain interiors. Multidomain non-overlapping decompositions In most applications.5) will be further partitioned using submatrices arising from the subregions. and this will be described later. . ⎪ ⎪ ⎪ ⎪ for 1 ≤ j ≤ nI ⎪ ⎪ (f I )j = (f )j . though strip decompositions have advantages. . . for (nI + . by construction it will hold n = (nI + · · · + nI + nB ). f TB where: ⎧ ⎪ (AII )lj = (Ah )lj . ⎪ ⎩ (f B )j = (f )nI +j .nI +j . as in Fig. j ≤ nI ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (AIB )lj = (Ah )l. 3. for (nI + 1) ≤ j ≤ (nI + nB ). and that the nodes x1 . The nodes within each subdomain Ωi and on the (i) interface B may be ordered arbitrarily. for 1 ≤ j ≤ nB .5) ATIB ABB uB fB T T corresponding to the partition u = uTI . for 1 ≤ l ≤ nI and 1 ≤ j ≤ nB ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (ABB )lj = (Ah )nI +l. for 1 ≤ l. for 1 ≤ i ≤ p xj ∈ B. . - AII AIB uI fI = . . Let nI denote the number of nodes in (i) subdomain Ωi and nB the number of nodes on B (i) . .4) can be block partitioned as:  . . box like subdomain decompositions will be employed. 3. Using this ordering. uTB and f = f TI . . Then. . for 1 ≤ j ≤ nI ⎪ ⎪ ⎪ ⎪ (uB )j for 1 ≤ j ≤ nB ⎪ ⎪ = (u)nI +j .nI +j . xn in Th (Ω) are ordered based on the subdomains Ω1 . for 1 ≤ l. .1 Background 109 Non-overlapping strip decomposition Non-overlapping box decomposition Ω1 Ω2 Ω3 Ω4 Ω5 Ω6 Ω7 Ω8 Ω1 Ω2 Ω3 Ω4 Ω5 Ω6 Ω7 Ω8 Ω9 Ω10 Ω11 Ω12 Ω13 Ω14 Ω15 Ω16 Fig. The block submatrices AII and AIB in (3.1. . . 3. nI ) + 1 ≤ j ≤ (nI + . . system (3.1.

IB 3.2 Schur Complement System The solution to system (3. 4.1). .110 3 Schur Complement and Iterative Substructuring Algorithms 3. First. Compute: ˜f B = f B − AT wI .5) below: AII uI + AIB uB = f I ATIB uI + ABB uB = f B yields uI = A−1 II (f I − AIB uB ) provided AII is invertible. We summarize the resulting algorithm below. The Schur complement system can be employed to T determine the solution uTI . S is the Schur complement of submatrix AII in Ah ). 2. Algorithm 3. This will be possible when matrix S is invertible. It corre- sponds to a discrete approximation of a Steklov-Poincar´e problem associated with elliptic equation (3.5) as follows. uTB . uTB to (3.6) ⎪ ⎩ ˜f ≡ (f − AT A−1 f ). where ⎨ ⎪ S ≡ (ABB − ATIB A−1 II AIB ) (3. Solve for wI : AII wI = f I .2. Matrix S is referred to as the Schur complement (strictly speaking. Eliminating uI using the first block equation in (3. T Output: uTI . Substituting this parametric representation of uI into the 2nd block equation above yields the following reduced linear system for uB : ⎧ ⎪ ⎪ SuB = ˜f B .6): uB = S −1 f B − ATIB A−1II f I . but posed on the interface B. B B IB III The system SuB = ˜f B is referred to as the Schur complement system. Once uB has been deter- mined.1 (Schur Complement Algorithm) 1. Solve for uI : AII uI = (f I − AIB uB ). yielding: uI = A−1 II (f I − AIB uB ). uI can be obtained by solving AII uI = (f I − AIB uB ). Solve for uB : S uB = ˜f B .5) can be sought formally by block Gaussian elim- ination. determine uB by (iteratively or directly) solving the Schur complement system (3.

. and an ordering of the nodes based of this. and by subsequently defining S wB ≡ ABB wB + ATIB wI .6) is typically solved using a preconditioned conjugate gradient iterative method. yielding that Aij = A(φi . then matrix S must first be assembled. .6). and instead only requires computing the action of S on different vectors. 3. By construction xi ∈ Ωj ⇔ i ∈ I (j) and I =. If a direct solver is employed to solve (3. given wB . This does not require explicit assembly of matrix S. Indeed. φj ) = 0. Ωp . in domain decomposition applications. matrix AII in system (3. However. the Schur complement system (3. define the index set: $ % (1) (j−1) (1) (j) I (j) ≡ i : (nI + · · · + nI + 1) ≤ i ≤ (nI + · · · + nI ) . Such matrix-vector products. . PR5].2 Schur Complement System 111 Schur complement and iterative substructuring algorithms are motivated by the preceding algorithm. for instance S wB . . The preceding version of the Schur complement algorithm can be imple- mented in parallel by using the block structure of matrix AII . note that when nodes xi and xj belong to the interiors of different subdomains. To see this. may be computed by first solving AII wI = −AIB wB in parallel (as discussed below) for wI .5) will be block diagonal. given a decomposition of Ω into the subdomains Ω1 . then the nodal basis functions φi (x) and φj (x) will have support in different subdomains. This is the approach employed in traditional substructuring [PR4. More formally.

AII satisfy: ⎡ (1) ⎤ ⎧. I (1) ∪ · · · ∪ I (p) . . . . It then follows (1) (p) that the diagonal blocks of AII = blockdiag AII . .

Given the non-overlapping subdomains Ω1 . (p)−1 .6). It employs a direct method to solve (3.8) F (v) = i=1 FΩi (v). . . . Ωp .. since the action of A−1 II = blockdiag(AII . v ∈ HD 1 (Ω) p (3. for v ∈ HD (Ω). .. each of which can be computed in parallel. v) = i=1 AΩi (u. ! .7) This block diagonal structure of AII will enhance the parallelizability of (1)−1 Schur complement algorithms. . for v ∈ HD (Ω).) and FΩi (. AII 0 ⎪ ⎪ A (j) = (Ah )˜lk˜ for 1 ≤ l. . for u. for u. v ∈ HD 1 (Ω)  FΩi (v) ≡ Ωi f v dx. (3. v) ≡ Ωi (a(x)∇u · ∇v + c(x) uv) dx. but incorporates the assembly of matrices Ah and S by a finite element subassembly procedure. let AΩi (. k ≤ nI (j) ⎪ ⎨ II lk ⎢ ⎥ AII = ⎢⎣ . We next describe the substructuring algorithm for solving (3.5). v). . the following subassembly relation will hold: p A(u.) denote subdomain forms:  AΩi (u. ⎥ where ⎦ ˜l = (n(1) + · · · + n(j−1) ) + l ⎪ ⎪ I I (p) ⎪ ⎩ (1) (j−1) 0 AII ˜ k = (nI + · · · + nI ) + k. AII ) involves p separate blocks.. 1 By definition. .

112 3 Schur Complement and Iterative Substructuring Algorithms If u. We may then represent:  (j) T  (j) (j)  (j)   (j) T  (j)  uI AII AIB vI fI vI AΩj (uh . each with a specified (j) local ordering of the nodes (for instance in ascending order of indices). Given finite element functions uh . with uI . vh ) = (j) (j)T (j) (j) . FΩj (vh ) = (j) (j) . on each subdomain Ωj . v ∈ Vh ∩ HD 1 (Ω). then these local forms can be represented using matrix- vector notation. vI ∈ IRnj and uB . respectively. let u. Accordingly. v ∈ IRn denote its vector of (j) (j) (j) (j) (j) nodal values. respectively. vh ∈ Vh ∩ HD 1 (Ω). let I (j) and B (j) denote the index sets of nodes in Ωj and ∂Ωj \BD . vB ∈ IRnB denoting subvectors corresponding to indices in I (j) and B (j) (in the local ordering of nodes). uB AIB ABB vB fB vB where the submatrices and subvectors are defined by: ⎧. Let nI (j) and nB denote the number of nodes in Ωj and B (j) .

⎪ ⎪ A (j) ≡ AΩj φ˜l . for 1 ≤ l. φk˜ . k ≤ nI (j) ⎪ ⎪ .

φk˜ . 1 ≤ k ≤ nB ⎪ ⎨. for 1 ≤ l ≤ nI . II ⎪ ⎪ lk ⎪ ⎪ (j) AIB (j) (j) ≡ AΩj φ˜l .

for 1 ≤ l. φk˜ . k ≤ nB ⎪ ⎪ . lk (j) (j) ABB ≡ AΩj φ˜l .

for 1 ≤ l ≤ nI ⎪ ⎪ . lk ⎪ ⎪ (j) (j) ⎪ ⎪ fI = FΩi φ˜l .

Definition 3. B (j) . l with ˜l and k˜ denoting global indices corresponding to the local indices l and k on Ωj and B (j) . l) = j (RW )lj = 0.9) These subassembly relations may equivalently be expressed based on restric- tion and extension matrices. B) let index(W. l ⎪ ⎩ f (j) (j) B = FΩi φ˜l . If nW denotes the number of nodes in W . if index(W. l) denote the global index associated with the l’th node in the lo- cal ordering of indices in W .2. we define restriction map RW as an nW × n matrix with entries: 1. . l) = j. as defined below. The discrete version of subassembly identity (3. for 1 ≤ l ≤ nB . vB fB vB fB (3.8) becomes: ⎧ T     (j) T  (j)   (j)  ⎪ ⎪  AII AIB (j) ⎪ ⎪ u I A II A IB v I p u I vI ⎪ ⎪ = j=1 ⎨ uB ATIB ABB vB uB (j) (j)T AIB ABB (j) (j) vB ⎪  T    (j) T  (j)  ⎪ ⎪ vI fI p vI fI ⎪ ⎪ ⎪ ⎩ = j=1 (j) (j) . For any set of indices W (such as I (j) . if index(W.

3. i.12) uB ATIB ABB uB (i) (i) 2. Suppose the following assumptions hold.10) ⎪ ⎪ ⎪ ⎪ fI p RI fI ⎪ ⎪ = j=1 . (i) (i) (i) (i) 2. with zero values for all other entries. 1. its extension RW T vW will denote a vector of size n whose entries at indices in W correspond to those of vW in the local ordering. - AII AIB uI 0 = .. . ⎩ fB (j) (j) RB fB This subassembly identity relates the global stiffness matrix and load vectors to the subdomain stiffness matrices and subdomain load vectors. (3. . -.2 Schur Complement System 113 Given a nodal vector v ∈ IRn . (3. given vW ∈ IRnW .11) ATIB ABB uB fB for some vector f B . -T . uTB ∈ IRn be discrete Ah -harmonic.. The subassembly relations (3. It will hold that:  p (i)T (i) uTB SuB = uB S (i) uB . The term f B = SuB and the Schur complement energy will satisfy: . The following result establishes a related expression between the global Schur complement (i) (i)T (i)−1 (i) matrix S and subdomain Schur complements S (i) ≡ ABB − AIB AII AIB . Let u = uTI . Let uI = RI u and uB = RB u. Lemma 3. (3. (3.9) may now be alternatively expressed as: ⎧   (j) T  (j)   (j)  ⎪ ⎪ p AII AIB (j) ⎪ AII AIB ⎪ RI RI ⎪ ⎪ = j=1 ⎨ AIB ABB T RB (j) (j) T AIB ABB (j) RB (j)    (j) T  (j)  (3. satisfy: . 3. T 1. its restriction RW v will denote a subvector of nodal values corresponding to the indices in W in the chosen local order- ing of nodes.14) i=1 (i) (i)T (i)−1 (i) where S (i) ≡ (ABB − AIB AII AIB ). Then the following results will hold. While.13) 3. -. - uI AII AIB uI uTB SuB = . The subvectors uI and uB will satisfy: (i) (i) (i) (i) AII uI + AIB uB = 0.e.

(3. i) = l (RG )il = (3. Given region G ⊂ B containing nG indices. . i) denote the index of the i’th local of G in the ordering of indices on B.4. the nodes in Ωi will be cou- pled only to nodes in Ωi and B (i) . using that AII = blockdiag(AII . let index(B. G. uB )T with (3. if index(B. Formally eliminating uI = −AII AIB uB and substi- (i) (i) tuting into the 2nd block equation above yields f B = S (i) uB where: (i) (i)T (i)−1 (i) S (i) ≡ (ABB − AIB AII AIB ).   The subassembly identity (3.11) with uTI . we apply (3. Definition 3. (3. . . To prove that f B = SuB eliminate uI using (3. Apply RI to (3.18) uB AIB ABB uB Substituting expressions (3.16) and employing that f B = S (i) uB yields:  T    (i) (i) (i) (i) uI AII AIB uI (i)T (i) (i) (i)T (i) (i) = uB S (i) uB . uI . as follows.17) We refer to S (i) as a local (subdomain) Schur complement. . uTB )T to obtain: (i) (i) (i) AII uI + RI AIB uB = 0. take inner product of (3. (3. (i)T (i)T To prove (3.14) may be expressed equivalently using re- striction and extension maps between nodal vectors on B and B (j) . G. - (i) (i) (i) AII AIB uI 0 (i) T (i) (i) = (i) .. uTB and substitute f B = SuB to obtain (3.9) yields (3. . It can easily be verified that RG = RG RB T .12) and (3. if index(B. (3. Taking the inner (i)T (i)T (i) (i) product of (uI . To prove (3.11) and substitute the resulting expression uI = −A−1 II AIB uB into the 2nd block equation to obtain T the desired result. T (p) (1) (p)T AII ) and u = (uI . Next.12). i) = l.15) Now. we restrict the block equation: AII uI + AIB uB = 0 (i) (1) to indices in I (i) . uB )T :    .18) into identity (3.14). G. . This yields: (i) (i) (i) RI AIB uB = AIB uB .19) 0. We define an nG × nB matrix RG as: 1.15) yields the desired result. . Substituting this expression into (3.11). .13).16) AIB ABB uB fB (i) (i) (i)−1 (i) (i) for some vector f B . for standard finite element discretizations.14).114 3 Schur Complement and Iterative Substructuring Algorithms Proof.13) to the local nodal vector (uI .

the Schur complement subassembly identity (3. respectively. 3.14) can be stated as:  p  p .2 Schur Complement System 115 (i) (i)T Using the restriction and extension maps RB and RB .

· · · . (i)T (i) (i)T (i) (i)T (i)−1 (i) (i) S= RB S (i) RB = RB ABB − AIB AII AIB RB . · · · .20) i=1 i=1 (i) (i)T (i)−1 (i) where S (i) = (ABB − AIB AII AIB ) is a subdomain Schur complement. f B (i) ⎪ ⎪ ⎪ ⎨ Determine the Cholesky factors: A(i) = L(i) L(i)T II I I ⎪ Assemble: S (i) ≡ A(i) − A(i)T L(i)−T L(i)−1 A(i) ⎪ ⎪ ⎪ BB IB I I IB ⎪ ⎩ Assemble: ˜(i) (i) (i)T (i)−T (i)−1 (i) f B ≡ f B − AIB LI LI fI . Endfor 3. Endfor .10) and (3. AIB . The traditional substructuring algorithm solves the Schur complement system by using Cholesky factorization. Assemble: ⎧ ⎨ S ≡ p R(i) S (i) R(i) T i=1 B B ⎩˜ p (i)T ˜(i) f B = i=1 RB f B 4. (i) 5. load vectors and Schur complement matrices using (3.2 (Substructuring Algorithm) 1. p in parallel do: ⎧ ⎪ ⎪ (i) (i) (i) (i) Assemble: AII . 2. ABB .9).20).2. (3. Determine the Cholesky factors: S = LS LTS and solve: LS wB = ˜fB T LS uB = wB . p in parallel solve for uI : (i) (i) (i) (i) (i) AII uI = (f I − AIB RB uB ). Algorithm 3. For i = 1. For i = 1. (3. 6. f I . The resulting algorithm is summarized below. and explicitly assembles the subdomain fi- nite element stiffness matrices.

ABB . . . T (1)T (p)T Output: uI . However. followed by the computation of the subdomain Cholesky factors. (i) (i) (i) (i) AIB . The computations on different subdomains can be performed in parallel. since . . uTB . . the substructuring algorithm is not purely algebraic. (i) Steps 1 and 2 in the substructuring algorithm involve the assembly of AII . modified loads and and Schur complement matrices S (i) . uI . f I and f B on each subdomain Ωi .

5. and of the forcing term f˜B in (3. SA2] is employed to solve SuB = ˜f B without assembling S.6) must be parallelized using traditional methods. When coefficient c(x) = 0 and B (i) = ∂Ωi . such inverses should not be assembled explicitly [GO4]. A A 1 0 IB BB T where 1 = (1. since nB can be large. Assembly of the global Schur complement matrix S using identity (3. Such a reduction in the . . 1) . .6. For brevity of expression. . 1) is of appropriate size.7. instead the action of the in- verse should be computed by the solution of the associated linear system. the submatrices (i) AII will be invertible. The magnitude of a nonzero entry Sij typically decreases with increasing distance between the nodes xi and xj . . we have employed matrix inverses in (i) the expressions for S (i) and ˜f B in the substructuring algorithm. . a preconditioned itera- tive method [GO4. Each (i) subdomain Schur complement matrix S (i) will be of size nB corresponding to the number of nodes on B (i) . Remark 3.20). The subdomain (i) Schur complement matrices S will typically not be sparse. However. From a computational viewpoint. Once uB is determined.116 3 Schur Complement and Iterative Substructuring Algorithms it employs the subdomain stiffness matrices (as they may not be available if the linear system Au = f has already been assembled). Similarly. . Remark 3. assembly of matrix S and its Cholesky fac- torization can be significant costs. matrix S (i) will also T be singular with a null vector of the form (1. In this case. the (i) components uI of uI can be determined in parallel (on each subdomain). . the Cholesky factorization of S and the solution of the Schur complement system yielding uB . . however. the entry Sij will be zero. then it may be possible to reduce these computational costs provided an effective preconditioner can be found. However. Remark 3. Explicit assembly of S (i) requires the solution (i) (i) of nB linear systems involving sparse coefficient matrix AII . otherwise. and satisfy:  (i) (i)      AII AIB 1 0 (i)T (i) = . If instead. must be parallelized traditionally. GR2. then entry Sij will typically be nonzero. The global Schur complement matrix S will have a block matrix structure depending on the ordering of nodes in B. the subdomain stiff- ness matrices will typically be singular. AX. and the subsequent cost of solving SuB = ˜f B using a direct solver. Such properties are further explored when block matrix preconditioners are constructed for S. the cost of the substructuring algorithm is dominated by the cost of assembling matrix S. If nodes xi and xj lie on some common subdomain boundary B (k) . From a computational viewpoint. their entries may decay in magnitude with increasing distance between the nodes.

2. For i = 1. Endfor . p in parallel do: ⎧ ⎪ ⎪ (i) (i) (i) (i) Assemble: AII . f I . However. p in parallel solve for uI : (i) (i) (i) (i) (i) AII uI = f I − AIB RB uB . 3.2 Schur Complement System 117 computational costs motivates the iterative substructuring method. For i = 1. AIB . 3. The iterative substructuring method has similar steps as Alg. 3. Assemble: p (i)T (i) ˜ fB = i=1 RB ˜ fB 4. ABB . · · · . vector ˜f B is assembled) and step 4 is replaced by a preconditioned CG method to solve SuB = ˜f B with a preconditioner M . (i) 5.4 through Chap.2. Precon- ditioners for S are considered in Chap.2. Solve SuB = ˜f B using a preconditioned CG method.7. We summarize the resulting algorithm.2. 2. matrix S is not assembled in step 3 (instead. 3. · · · . Algorithm 3. Endfor 3. 3. f B (i) ⎪ ⎪ ⎪ ⎨ Determine the Cholesky factors: A(i) = L(i) L(i)T II I I ⎪ Assemble: S (i) ≡ A(i) − A(i)T L(i)−T L(i)−1 A(i) ⎪ ⎪ ⎪ BB IB I I IB ⎪ ⎩ Assemble: ˜(i) (i) (i)T (i)−T (i)−1 (i) f B ≡ f B − AIB LI LI fI .3 (Iterative Substructuring Algorithm) 1. Steps 5 and 6 remain as in Alg. 6.2.

8. . Remark 3. uTB . . T (1)T (p)T Output: uI . . uI . The cost of implementing a preconditioned iterative method to solve SuB = ˜f B using a preconditioner M in step 4 will be proportional to the number of preconditioned iterations and to the cost per . .

When these submatrices are available. Y = I. may be less than the cost of assembling S and solving S u = f using a direct method. B.9. a product with S can be computed as:  p . The iterative substructuring method is not purely algebraic. if the cost of solving M wB = rB is modest. Furthermore. iteration. When the (i) number of preconditioned iterations is less than mini nB . then the total cost of solving S uB = ˜f B iteratively without assembling S. as it (i) employs the subdomain stiffness matrices AXY for X. the cumulative cost for computing matrix-vector products with S will not exceed the cost of assembling the subdomain Schur complement matrices S (i) . Remark 3.

21) i=1 . (i)T (i) (i)T (i)−1 (i) (i) S wB = RB ABB − AIB AII AIB RB wB . (3.

close to machine precision (when computing the matrix-vector product with S). (3.25) (derived later in this section). −1 - I −A˜−1 ˜ II AIB I 0 I 0 A˜II 0 A˜−1 = −1 ˜ .. GO4]: . and these approximations must be scaled appropriately. each iteration will require the solution of a linear system of the form ˜ = r. suppose A˜II and M are preconditioners for matrices AII and S. we shall separately consider two dimensional and three dimensional domains. MA11. since AII is block diagonal. and let A˜IB denote an approximation of AIB . . which can be obtained by formally applying the expression z = A˜−1 r Az given above. then such a product can be computed using: SwB = ABB wB − ATIB A−1 II AIB wB . Indeed. This approach. then motivated by the block form (3.10. These preconditioners will be grouped as two subdomain or multisubdomain pre- conditioners. which results in the Schur complement system. Such an approach will have the advantage that the subdomain problems need not be exact. Indeed. it has been shown that if A˜II ≡ α AII for some α = 1. and when applying a CG method. after describing properties of matrix S and FFT based direct solvers for S. 3. however. A separate section is devoted to the robust class of Neumann-Neumann and balancing domain decomposition preconditioners. followed by defining SwB ≡ ABB wB + ATIB wI . a preconditioner A˜ for stiffness matrix A may be constructed: . care must be exercised in the choice of matrices A˜II and A˜IB approximating AII and AIB .1 Properties of the Schur Complement System From a matrix viewpoint. we shall focus on preconditioners M for S for use in the iterative substructuring or Schur complement algorithm.23) 0 I 0M −AIB I T 0 I Matrix A˜ will be symmetric and positive definite. can be understood to arise from the following block matrix factorization of A. However.2. with p diagonal blocks). The iterative substructuring and Schur complement algorithms have the disadvantage that they require the solution of subdomain problems (i) (i) (i) of the form AII wB = rB .22) This requires computing AIB wB first. the block elimination of uI . -. respectively. as expressed next [CO6. An alternative approach which avoids this is to solve the original linear system Au = f by a preconditioned CG method with a block matrix preconditioner A˜ for A.118 3 Schur Complement and Iterative Substructuring Algorithms However. as most preconditioners depend on the geom- etry of the interface B. Remark 3. respectively. when these matrices are not available (for instance. In the latter case. when matrix A is already assembled). -. In the remainder of this chapter. re- quires two subdomain solves per iteration involving coefficient matrix A˜II . then the convergence rate of of the conjugate gradient method deteriorates significantly [BO4]. followed by solving AII wI = −AIB wB (in parallel. (3.

Then the following results will hold: 1.. - −1 I −A−1 II AIB AII 0 I 0 A = 0 I 0 S −1 − ATIB A−1 II I .. - T EuB A II A IB EuB uB SuB = . . . - AII AIB I 0 AII AIB A≡ = ATIB ABB ATIB A−1 II I 0 S .24) I 0 AII 0 I A−1 II AIB .25) I −A−1 II AIB I 0 I 0 A−1 II 0 = . . with  .27) uB ATIB ABB uB for arbitrary uB . We employ the notation λm (C) and λM (C) to denote the minimum and maximum eigenvalues. but its action must be computed. = ATIB A−1 II I 0 S 0 I −1 where S ≡ ABB − ATIB AII AIB denotes the Schur complement matrix. S will be symmetric and positive definite. - AII AIB EuB 0 = .2 Schur Complement System 119 . . (3. and let κ2 (C) ≡ λM (C)/λm (C) denote the spectral condition number of C. (3. then the Schur complement matrix S need not be assembled explicitly. Let A be a symmetric positive definite matrix having the block structure:   AII AIB A= . matrix A−1 will formally have the following block factorizations: . of a real symmetric matrix C. The following result provides bounds for the extreme eigenvalues of S when A is a symmetric and positive definite matrix. However. respectively. Define EwB ≡ −A−1 II A IB wB for a vector wB . we let σ1 (D) denote its smallest singular value. -. −1 -. and S −1 once. .11.26) ATIB ABB uB SuB for arbitrary uB . . -. . In this case. Let S = (ABB − ATIB A−1 II AIB ) denote the Schur complement matrix. if iterative methods are employed. Given an arbitrary matrix D. Suppose the following assumptions hold.. -. 2.. (3. 3. (3. The energy associated with matrix S will satisfy: . -T  .. 1. 3. 0 I 0 S −1 − ATIB I 0 I To formally determine the solution of Au = f using this block factorization of A−1 requires computing the action of A−1 II twice. . Lemma 3. . . ATIB ABB 2.

-T . uTB to obtain: . Consequently. 4. - AII AIB EuB 0 = . When A is symmetric positive definite.. - T EuB AII AIB EuB uB SuB = uB ATIB ABB uB . -T  . The minimum eigenvalue of S will satisfy: λm (A) σ1 (E)2 + 1 ≤ λm (S). Since A is symmetric positive definite. Proof. -T . .120 3 Schur Complement and Iterative Substructuring Algorithms 3. we immediately obtain that: uTB SuB ≥ λm (A)uTB uB . λM (AII ) 5. since σ1 (E) ≥ 0. matrix S = ABB − ATIB A−1 II AIB will be defined and symmetric by construc- tion. The Schur complement matrix S will be better conditioned than matrix A in the spectral norm: κ2 (S) ≤ κ2 (A). . ATIB ABB uB SuB To show that S is positive definite. - EuB AII AIB EuB EuB 0 = uB ATIB ABB uB uB SuB = uTB SuB . so that A−1 II is well defined. with its lowest eigenvalue at least as large as the lowest eigenvalue of A: λm (A) ≤ λm (S). its diagonal block AII will also be symmetric and positive definite. we obtain that: .. . take inner product of the above equation T with (EuB )T . The maximum eigenvalue of S will satisfy:   σ1 (AIB )2 λM (S) ≤ λM (ABB ) − . and so S will be positive definite. - EuB EuB ≥ λm (A) uB uB ≥ λm (A) (EuB )T EuB + uTB uB ≥ λm (A) σ1 (E)2 + 1 uTB uB . Substituting the definition of EuB and computing directly yields:  . In particular. -T  .

employing the definition of S.2 Schur Complement System 121 Next. we obtain that uTB SuB = uTB ABB − ATIB A−1II AIB uB ≤ uTB ABB uB − uTB ATIB A−1 II AIB uB T ≤ λM (ABB ) − σ1 (AIB )2 λm (A−1 II ) uB uB . 3.

we obtain: λM (S) ≤ λM (A). . λm (S) λm (A) which is the desired result. Lemma 3. 1. Then. see [BE17]. ∀i. and since 2 − σλ1M(A(AIBII)) ≤ 0. j and either (K −1 )ij ≥ 0 entrywise or if all minors of K are positive. In particular.12. Equivalently. 2 = λM (ABB ) − σλ1M(A(AIBII)) uTB uB . Definition 3.13. then the Schur complement S will also be an M -matrix. j. A nonsingular matrix K is said to be an M -matrix if: ⎧ ⎪ ⎨ (K)ii > 0. This will hold even if matrix A is non-symmetric. ∀i (K) ij ≤ 0. K is an M -matrix if it can be expressed in the form K = r I − N where (N )ij ≥ 0 for all i. i = j ⎩ K −1 ≥ 0. ⎪ ij see [VA9. The next result shows that if matrix A is an M -matrix. SA2]. S = (ABB − ABI A−1 II AIB ) will also be an M -matrix. Let matrix A be non-symmetric and block partitioned as follows:   AII AIB A = .   Refinements of the preceding bounds may be found in [MA11]. Let A be an M -matrix. Suppose the following assumptions hold. ABI ABB 2. since the eigenvalues of the principal submatrix ABB of A must lie between the maximum and minimum eigenvalues of A. Combining the upper and lower bounds for the eigenvalues of S yields: λM (S) λM (A) κ2 (S) = ≤ = κ2 (A).

.   We shall now consider analytic properties of the Schur complement ma- trix S. These properties will be employed to construct ap- proximations of S which serve as preconditioners. I we obtain that S −1 ≥ 0 entrywise. for x ∈ B.1). Ωp denote a nonoverlapping decomposition of Ω and let uB denote a sufficiently regular function defined on interface B with zero values on BD . where wB = E uB denotes the piecewise L-harmonic extension of uB on B: LwB = 0.122 3 Schur Complement and Iterative Substructuring Algorithms Proof. trace theorems.1) and its discretization. Let Lu ≡ −∇ · (a∇u) + c u denote the elliptic operator underlying (3.27). it will hold that S = (ABB − ABI A−1 II A IB ) has the form r I − GBB for GBB = (NBB + ABI A−1 II AIB ) ≥ 0 entrywise. since AII = r I − NII for NII ≥ 0 entrywise and since the minors of AII will be positive. . . Heuristically by analogy with (3.26). uB )L2 (B) = A (EuB . submatrix AII will also be an M -matrix. Remark 3. Using the continuous analog of (3. it will be of the form A = r I − N where N ≥ 0 entrywise. - −1 " # −1 0 S = 0I A . Let Ω1 . MA17]. Since A BB = r I − N BB where NBB ≥ 0 entrywise. BR15. We begin by identifying a Steklov-Poincar´e operator S whose discrete analog yields matrix S. . We shall employ the notation: . We next describe bounds for the eigenvalues of S in terms of the mesh size h and coefficients in the elliptic equation. A−1 II ≥ 0 entrywise. we heuristically define the action of a Steklov-Poincar´e operator S on a function uB defined on interface B as follows: SuB (x) ≡ LwB (x). because ABI ≤ 0. EuB ). DR10. S will be an M -matrix [BE17]. the energy associated with the Steklov- Poincar´e operator S will satisfy: (SuB . Thus. Furthermore. on ∂Ωi . As a result. Thus.) is defined by (3. inherited from the underlying elliptic partial differential equation (3. See [CR. NA].14. BR12. (3. Such estimates employ properties of elliptic equation (3. . .1). wB = uB . it will hold that (ABI A−1II AIB ) ≥ 0 entrywise. discrete extension theorems and also inverse inequalities for finite element spaces. in Ωi for 1 ≤ i ≤ p. see [DR2. First note that since A is an M -matrix. Since: .2).28) where A(. DR14. AIB ≤ 0 and A−1 II ≥ 0 entrywise. fractional Sobolev norms.

but possibly dependent on the subdomain diameter h0 .29) for 1 ≤ i ≤ p where C does not depend on h. Define σm = min {cm .30) with its energy equivalent to the Schur complement energy.33) .28): uTB SuB = A(uh . Let the following inverse inequality hold for all vh ∈ Vh vh 1/2.Ωi ≡ |∇u|2 dx ⎪ ⎪ Ωi   ⎪ ⎪ ⎨ u21.∂Ωi ≤ A(uh .31) 2. v) = 0. Lemma 3. as in (3. 1.2 Schur Complement System 123 ⎧ 2  ⎪ ⎪ |u|1. 3. i=1 i=1 (3. σm and σM .∂Ωi . such that: ! !  p p c σm uh 0. (3. uTB where uI satisfies uI ≡ EuB = −A−1 II AIB uB . (3. 2 i=1 i=1 (3. There exists c > 0 and C > 0 independent of h. Suppose the following assumptions hold with BD = ∂Ω.∂Ωi |x−y|d dx dy. uh ) ≤ C σM uh 21/2. Let the coefficients a(x) and c(x) satisfy: 0 < am ≤ a(x) ≤ aM 0 < cm ≤ c(x) ≤ cM .32) 3. 3. The finite element function uh will be piecewise discrete L-harmonic: A(uh . such that: ! !  p p c σm uh 21/2.∂Ωi ≤ Ch−1/2 vh 0. aM }. but possibly dependent on the subdomain diameter h0 . ∀v ∈ Vh ∩ H01 (Ωi ). The following result will not be optimal with respect to coefficient variation or the diameter h0 of the subdomains. uh ) ≤ C σM 2 uh 0.∂Ωi h−1. 1.∂Ωi ≡ ∂Ωi ∂Ωi |u(x)−u(y)| |x−y|d dx dy + ∂Ωi |u|2 dx. uh ). am } and σM = max {cM . Ωi ⊂ IRd ⎪ ⎪ ⎪ ⎩ u2   2  1/2. (3.∂Ωi ≤ A(uh . 1 ≤ i ≤ p. σm and σM . Then the following results will hold. There exists c > 0 and C > 0 independent of h. 2.Ωi ≡ |∇u|2 dx + Ωi |u|2 dx Ωi   2 ⎪ ≡ ∂Ωi ∂Ωi |u(x)−u(y)| 2 ⎪ ⎪ |u|1/2. Let uh denote a finite element function corresponding to a nodal vector T u = uTI .∂Ωi .15.

up to a scaling factor. vB .∂Ωi ≤ C h−1 uh 20.. uh ) associated with uh .33). - vI AII AIB EuB vI 0 = .124 3 Schur Complement and Iterative Substructuring Algorithms T Proof.∂Ωi ≤ uh 21/2. uh ) = uTB SuB . Combining the trivial bound c uh 20.Ωi . we obtain A(uh . 3.∂Ωi . we employ a discrete extension theorem (see Chap. ∀vh ∈ Vh ∩ H01 (Ωi ). -T   . uh ) ≤ σM uh 21.33) yields: . vh ) = 0.∂Ωi will be equivalent.∂Ωi . Substituting the above upper and lower bounds into (3. we may equivalently express the preceding as: A(uh .Ωi .32).16. then known properties of the mass matrix [ST14.29) yields: c uh 20. for 1 ≤ i ≤ p. where c > 0 is independent of h and the coefficients.26) yields: .Ω . for C > 0 independent of h and the coefficients. Applying the inner product of vTB . To obtain an upper bound. uTB . CI2] imply that uh 20. uh ) ≤ σM uh 21.∂Ωi ≤ uh 21/2. This verifies that uh is discrete L-harmonic on each Ωi . since vh will be zero on B.34) i=1 i=1 Application of the trace theorem on each Ωi yields the lower bound: c uh 21/2. Substituting this in (3. 0)T . vB ATIB ABB uB vB SuB If vB = 0. . 1 ≤ i ≤ p. for 1 ≤ i ≤ p. (EvB )T with (3. If uh is a finite element function corresponding to the nodal T vector u = uTI . We then decompose the Sobolev norm based on the subdomains to obtain:  p  p σm uh 21. we employ the equivalence between the energy norm and the Sobolev norm: σm uh 21. -T . but may depend on h0 . If vh denotes the finite element function corresponding to the nodal vector v = (vTI .   Remark 3. (3.Ωi ≤ A(uh . To derive bounds for the energy A(uh . to the Euclidean norm of the nodal vector u restricted to ∂Ωi .34) yields (3.∂Ωi ≤ uh 21. then the right hand side will be zero.Ω ≤ A (uh .Ωi ≤ C uh 21/2.∂Ωi with inverse in- equality (3. ∀vI . Combining the preceding bound with (3.32) yields (3. By choosing vB = uB and vI = EuB .9) and a prior estimates for discrete harmonic functions to obtain: uh 21. but possibly dependent on h0 .

Here.3 FFT Based Direct Solvers For a discretization of a separable elliptic equation. it may be possible to construct fast Fourier transform (FFT) based direct solvers for the stiffness matrix A and the Schur complement matrix S. when uh is constant locally). RE. for 1 ≤ i ≤ p. this transformed system will be block diagonal. for any finite element function uh ∈ Vh satisfying (3. uh ) can become zero even when uh (x) = 0 (for instance. and it can be solved in parallel using band solvers. Remark 3. where h0 denotes the subdomain diameter [BR24]. These bounds compare favorably with the condition number bound of C (σM /σm ) h−2 for κ2 (A). then the following equivalence will hold. the stiffness matrix A must have a block matrix structure in which each block is simultaneously diagonalized by a discrete Fourier transform matrix Q. Remark 3. CH14. (3.∂Ωi ≤ A(uh . into a block matrix with diagonal submatrices. see Chap.3 FFT Based Direct Solvers 125 vTB SvB c σm ≤ ≤ C σM h−1 .17. the stiffness matrix A can be transformed. vTB vB Thus. . When this property holds. 00 00 Discrete approximations of the fractional Sobolev norm vh 2 1/2 will be H00 (B (i) ) considered later in this chapter for finite element functions. the condition number κ2 (S) will grow as C (σM /σm ) h−1 with decreas- ing mesh size h. 3. 3. uh ) ≤ C ρi |uh |21/2. After appropriately reorder- ing the unknowns.30): ! ! p p c ρi |uh |21/2. for fixed subdomains.18. see [BJ9. VA4]. A refinement of this estimate yields: κ2 (S) ≤ C (σM /σm ) h−1 0 h −1 .35) i=1 i=1 with 0 < c < C independent of h and a(x). seminorms replace the norms since some of the local Dirichlet energies AΩi (uh . 3. x ∈ Ωi . CH13. then the following norm equivalence can be employed [LI4]: c v2H 1/2 (B (i) ) ≤ v2H 1/2 (∂Ωi ) ≤ C v2H 1/2 (B (i) ) . using an orthogonal similarity transformation. When c(x) = 0 in (3.∂Ωi .1) and a(x) is piecewise constant: a(x) = ρi . For such solvers to be appli- cable. with band matrices along its diagonal.9. If v = 0 on BD and ∂Ωi ∩ BD = ∅.

126 3 Schur Complement and Iterative Substructuring Algorithms Strip subdomains Triangulation of the domain Ω1 Ω2 Ω3 Ω4 Lx2 E (1) E (2) E (3) hx2 L0 L1 L2 L3 Lx1 hx1 Fig.3.2. we outline the construction of such fast direct solvers. for matrix A and its Schur complement S. this will yield an explicit eigendecomposition of the Schur complement S.2. 3. with a uniform grid and constant coefficients. In the special case of a two subdomain rectangular decomposition. and algorithm to solve SuB = ˜f B is summarized in Alg. Lx2 ) for x = (x1 . Strip decomposition with four subdomains In this section.3. We shall consider the following separable elliptic equation posed on a two dimensional rectangular domain Ω = (0.1. The FFT based algorithm to solve Au = f is summarized in Alg. 3. x2 ): . Lx1 ) × (0. 3.

.

. − ∂x ∂ 1 a1 (x) ∂u ∂x 1 − ∂ ∂x 2 a 2 (x) ∂u ∂x 2 = f (x). where L0 ≡ 0 < L1 < · · · < Lp ≡ Lx1 . for 1 ≤ i ≤ p. for x ∈ Ωi (i) for 1 ≤ i ≤ p. Ωp of Ω consisting of the strip subdomains: Ωi ≡ (Li−1 .36) u = 0. The coefficients a1 (x) and a2 (x) in the elliptic equation will be assumed to be constant within each subdomain Ωi : (i) a1 (x) = a1 .j = uh (ihx1 .2. Li ) × (0. a2 (x) = a2 . so that there are integers Lr such that Lr = Lr hx1 . and the nodal values of a finite element function uh at these grid points will be denoted ui. for 0 ≤ r ≤ p. Lx2 ). 3. . for x ∈ ∂Ω. The grid points (ihx1 . the stiffness matrix A resulting from the finite element discretization of (3. for x ∈ Ω (3. for x ∈ Ωi For this choice of coefficients and triangulation. Triangulate Ω using a uniform grid with (l − 1) × (k − 1) interior grid points having mesh spacings hx1 ≡ (Lx1 /l) and hx2 ≡ (Lx2 /k) as in Fig. jhx2 ). . jhx2 ) for indices 1 ≤ i ≤ (l −1) and 1 ≤ j ≤ (k −1) will lie in the interior. jhx2 ). The subdomain boundary segments E (r) ≡ ∂Ωr ∩∂Ωr+1 for 1 ≤ j ≤ (p−1) will be assumed to align with the triangulation. We formally denote it as: .36) will have the following stencil at a gridpoint (ihx1 . . We consider a nonoverlapping decomposition Ω1 .

⎥+ ⎢ . define subvectors ui ≡ (ui.j ) if i = Lr ..j = a1 hx (ui. ul−1 ) .j+1 ) ⎪ ⎨ 2 (r) hx2 (Au)i.j − ui−1. ⎥ ⎢ . ⎪ ⎪ 1 ⎪ ⎪ (r) hx1 ⎪ ⎪ + a2 hx (2ui. ⎥ (i) ⎢ a2 h x 1 ⎢ ⎥ (i) ⎢ 2a1 hx2 ⎢ ⎥ . ⎦ −1 2 1 while 1 . ⎥ ⎣ −1 2 −1 ⎦ ⎣ .. 2 To represent (3. hx 2 ⎢ ⎥ hx 1 ⎢ ⎥ ⎢ ⎥ ⎢ .j − ui.37) as a linear system.1 . ui.. ⎥ ⎢ .j ) x 2 ⎪ ⎪ ⎪ (a2 +a2 )hx1 ⎩ (r) (r+1) + 2hx (2ui. the linear system Au = f representing (3. ⎥ ⎢. .. ⎥ . . ⎥ ⎢ −1 2 −1 ⎥ ⎢ . (3. ..j−1 − ui.j−1 − ui. .. . ⎥. ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ −β (l−3) T (l−2) −β (l−2) ⎦ ⎣ ul−2 ⎦ ⎣ f l−2 ⎦ −β (l−2) T (l−1) ul−1 f l−1 where T (r) and β (r) are submatrices of size (k − 1) defined for Li−1 < r < Li : ⎡ ⎤ ⎡ ⎤ 2 −1 1 ⎢ ⎥ ⎢ ..k−1 )T T for 1 ≤ i ≤ l − 1 and employ them to define a nodal vector u ≡ (u1 .j − ui..j ) if i = Lr (3.38) ⎢ ⎥⎢. ⎥ ⎢ . 3.j − ui−1. · · · .37) ⎪ ⎪ 1 ⎪ ⎪ (r+1) h ⎪ + a1 ⎪ hx1 (ui..37) is: ⎡ (1) ⎤⎡ ⎤ ⎡ ⎤ T −β (1) u1 f1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ −β (1) T (2) −β (2) ⎥ ⎢ u2 ⎥ ⎢ f 2 ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ . For this ordering of nodes. ⎥ ⎢ ⎥ = ⎢ ⎥..3 FFT Based Direct Solvers 127 ⎧ (r) h ⎪ ⎪ a1 hxx2 (2ui.j − ui+1. . ⎥ T ≡ (r) ⎢ .j − ui+1.. . · · · . .j+1 ) .

The submatrices β (r) are multiples of the identity: ⎧ (i) hx ⎨ a1 hx2 I. if Li−1 < r < Li β ≡ (r) . tridiagonal and Toeplitz (it is constant along each diagonal) of size (k − 1). (Li −1) T (r) ≡ T + T (Li +1) . for r = Li . 2 Each matrix T (r) is symmetric.

.38) is that its submatrices T (r) and β (r) are diagonalized by a discrete sine transform matrix Q. 1 (3. if r = Li . An important property of matrix A in (3. 2 1 1 hx 1 where I denotes an identity matrix of size (k − 1). as defined next.39) ⎩ 1 a(i+1) + a(i) hx2 I.

By direct substitution and the use of trigonometric identities it is easily verified that qj is an eigenvector of matrix T (r) : (r) T (r) qj = λj qj . for 1 ≤ i. Given an integer k > 2.19. j ≤ (k − 1). · · · . Routines for fast multiplication of a vector by Q are available in most FFT packages with complexity proportional to O(k log(k)). Using trigonometric identities.128 3 Schur Complement and Iterative Substructuring Algorithms Definition 3. sin . see [VA4]. it can be verified that QT Q = I so that Q is an orthogonal matrix. we apply matrix T (r) to the j’th column vector qj of Q. (r) corresponding to the eigenvalue λj given by: ⎧ (i) ⎪ 2a2 hx1 2a(i) h ⎪ ⎨ hx2 1 − cos jπ k + 1hx x2 . 1 (r) λj ≡ . matrix Q is symmetric. k k k k By construction. if Li−1 < r < Li . we define the entries of a discrete sine transform matrix Q of size (k − 1) as follows: *   2 ijπ Qij ≡ sin . we let qj denote the j’th column of matrix Q *       T 2 jπ 2jπ (k − 1)jπ qj = sin .40) k k For 1 ≤ j ≤ (k − 1). sin . To verify that each block of A is diagonalized by Q. (3.

.

(r) (r) where Λ(r) = diag(λ1 . ⎪ ⎪ (i) (i+1) hx1 jπ (i) (i+1) ⎩ a2 +a2 1 − cos k + a1 +a1 hx2 . λk−1 ). hx2 hx1 (3. where each Dij is a diagonal matrix of size m. Let Q be an orthogonal matrix of size m which simultaneously diagonalizes all the block submatrices of C: QT Cij Q = Dij . Suppose the following assumptions hold. Lemma 3. Since the matrices β (r) are scalar multiples of the identity. The following algebraic result shows how any block partitioned system Cw = g can be reduced to a block diagonal linear system provided all blocks of matrix C can be simultaneously diagonalized by an orthogonal matrix Q. if r = Li .20. 2. 1. . j ≤ n. T (r) has the eigendecomposition: T (r) = QΛ(r) QT .41) Thus. for 1 ≤ i. · · · . j ≤ n. Let C be an invertible matrix of size m n having an n×n block structure in which the individual blocks Cij are submatrices of size m for 1 ≤ i. they are also trivially diagonalized by Q.

⎥. Thus. ⎥ ⎢ . . (3. 1 ≤ k ≤ n.42) Cn1 · · · Cnn wn gn T where g = gT1 . ⎥⎢ . ⎦ = ⎣ .. .3 FFT Based Direct Solvers 129 T 3. ⎥ = ⎢ . For 1 ≤ i ≤ m subvector µi of size n is defined by: (µi )k = QT gk i . ⎦ 0 Gmm αm µm where Gii . ⎦ ⎣ . 1 ≤ l. ⎥ = ⎢ ..42) can be obtained by solving the following block diagonal linear system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ G11 0 α1 µ1 ⎢ ⎥⎢ . By construction Q will also be an orthogonal matrix. . ⎦⎣ ..44) ⎣ ⎦⎣ . each block submatrix Dij = QT Cij Q of D will be a diagonal matrix of size m. 1 ≤ k ≤ n. gTn with gi ∈ IRm . . . ⎥ ⎢ . For 1 ≤ i ≤ m subvector αi of size n is defined by: (αi )k = QT wk i . wTn with wi ∈ IRm denote the solution to the block partitioned linear system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ C11 · · · C1n w1 g1 ⎢ . . Define a block diagonal matrix Q ≡ blockdiag (Q. for 1 ≤ i ≤ m. . αi and µi are defined by: 1. Then.. By construction. (3. ⎥ ⎢ . ⎦ ⎣ .43) ⎣ . (3. ⎥. using the given orthogonal matrix Q of size m. For 1 ≤ i ≤ m matrix Gii is of size n with entries defined by: (Gii )lk ≡ (Dlk )ii . ⎥ ⎢ . . ⎦. the transformed linear system becomes Dw ˜ =g ˜ . ⎥ ⎢ . . . Let w = wT1 . 3. . . 2. . ⎦ Q Cn1 Q · · · Q Cnn Q T T T Q wn T Q gn Define D ≡ QT CQ and let w ˜ ≡ QT w and g ˜ ≡ QT g denote the trans- formed vectors. components of w ˜ will be coupled within the transformed linear system Dw ˜ = g ˜ only when its indices differ by an integer multiple of m.. k ≤ n. . Apply Q to transform the linear system Cw = g into QT CQ QT w = QT g : ⎡ T ⎤⎡ T ⎤ ⎡ T ⎤ Q C11 Q · · · QT C1n Q Q w1 Q g1 ⎢ ⎥⎢ . ⎥ .. Then. . ⎥ ⎢ . . As a consequence. Q) having n di- agonal blocks. 3. ⎦ ⎣ . the solution to system (3. a suitable reordering of the indices within the transformed system should yield a block diagonal linear system. Proof. ⎥ ⎢ . for 1 ≤ i ≤ m. . for 1 ≤ i ≤ m. ⎢ ⎥ ⎢ ⎥ ⎣ . ⎥ ⎢ .

j ) = Q (c1. Once all the unknowns cij have been determined by parallel solution of the tridiagonal linear systems. . By construction. for 1 ≤ j ≤ (l − 1). . we partition the index set {1. We reorder the components of w ˜ and define α ≡ P T QT w. · · · .38) using Lemma 3. each containing n entries ordered in ascending order. We define cj = QT uj and ˜f j = QT f j for j = 1. . . nm}. . ck−1. each Gii in (3. Once the subproblems Gii αi = µi in (3.21. Let P T denote a permutation matrix whose action on a vector reorders its entries according to the above ordering. A fast direct solver can be constructed for solving (3.j . Furthermore. reordering the rows and columns of matrix a reordering of g D should yield G = blockdiag(G11 . .   Remark 3. Gmm ) = P T DP to be a block diagonal matrix. . ck−1. In this case. n = (l − 1) with m = (k−1).130 3 Schur Complement and Iterative Substructuring Algorithms Accordingly. · · · .43).j ) . then each submatrix Gii will be a tridiagonal matrix. we define µ ≡ P T QT g as ˜ .j ) . . Similarly. The original unknowns wk will satisfy wk = Qyk for 1 ≤ k ≤ n. The resulting partition will be: {1. FFT Based Solution of Au = f . 1 ≤ k ≤ n. There will be m subsets in this partition. . · · · . system QT AQ QT u = QT f will also be block tridiagonal.j . It can be easily verified that the block submatrices Gii in the preceding will inherit the “block sparsity pattern” of C. each nonzero block in QT AQ will satisfy: QT T (r) Q = Λ(r) . Let Q denote the discrete sine transform matrix defined by (3. . where: T cj = (c1. 2m. . . and since multiplication by Q has O (l k log(k)) complexity. . . · · · . 2. l − 1.43) have been solved in parallel. for j = 1. Since A is block tridiagonal. nm} into subsets such that two indices belong to the same subset only if they differ by an integer multiple of m. l − 1. . Since a tridiagonal system can be solved in optimal order complexity. define yk for 1 ≤ k ≤ n as follows: (yk )i = (αi )k . · · · . . if C is block tridiagonal.43) will be a tridiagonal matrix. The reordered transformed system P T DP P T w ˜ = PT g ˜ will then correspond to the system (3. 1 + m.20 and choosing C = A. the nodal values {uij } at the grid points can be reconstructed by applying Q columnwise T T (u1. For example. uk−1. for 1 ≤ i ≤ m. n m} = {1. QT β (r) Q = β (r) . . (n − 1)m + 1} ∪ · · · ∪ {m. . . . the complexity of the FFT based solution algorithm will be O (l k log(k)). w = u and g = f .j .40). .

.l−1 (˜f l−1 )i 6. Given a finite element function uh with nodal values uij = uh (ihx1 . · · · . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −βi (l−3) (l−2) λi −βi (l−2) ⎥ ⎣ ci. for 1 ≤ r ≤ p I ≡ I (1) ∪ · · · ∪ I (p) E (r) ≡ {(Lr hx1 . jhx2 ) : Lr−1 < i < Lr .. ⎥ ⎢ . . ⎦. .39) 1. l − 1 in parallel do: 8. Compute using the fast sine transform ⎡ ⎤ c1. Compute the fast sine transform: f j ≡ Qf j . ⎥ ⎢ ⎥⎢ ⎥ = ⎢. Lemma 3. Endfor T Output: uT1 . .1 (FFT Based Solution of Au = f ) (r) Let λj and β (r) be defined by (3. we will employ the following notation for index sets and nodal vectors associated with them. Endfor 4.1 (f 1 )i ⎢ ⎥ ⎢ −β (1) λ(2) −β (2) ⎥ ⎢ ci. ˜ 3. FFT based solution of SuB = ˜f B .j ⎢ ⎥ uj ≡ Q ⎣ ..2 ⎥ ⎢ (˜f 2 )i ⎥ ⎢ i i i ⎥⎢ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎥⎢. jhx2 ) for 1 ≤ i ≤ (l − 1) and 1 ≤ j ≤ (k − 1).. ⎥ ⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ .41) and (3. k − 1 in parallel do: 5. provided the block submatrices of S are simultaneously diagonalized by an orthogonal matrix.3. ck−1. jhx2 ) : 1 ≤ j ≤ (k − 1)}. ⎥ ⎢ ⎥⎢ . uTl−1 . . 3. For j = 1. · · · . Solve the tridiagonal system using Cholesky factorization: ⎡ (1) ⎤ λi −βi (1) ⎡ ⎤ ⎡ ˜ ⎤ ci. ⎥⎢.3 FFT Based Direct Solvers 131 Algorithm 3. ⎢ . . . Accordingly.20 can also be applied to construct a direct solver for the Schur complement system. Endfor 7. . . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ .. . . .. 1 ≤ j ≤ (k − 1)} . in the following we study the block structure of the Schur complement matrix S. For j = 1.. l − 1 in parallel do: 2.. . · · · .. ⎥⎢. . for 1 ≤ r ≤ (p − 1) B ≡ E (1) ∪ · · · ∪ E (p−1) . I (r) ≡ {(ihx1 .j 9. ⎥.l−2 ⎦ ⎣ (˜ f l−2 )i ⎦ ⎣ ⎦ −βi (l−2) (l−1) λi ci.. ⎥ ⎢ . ⎥⎢. For i = 1.

ui. we have used E (r) to denote interface E (r) = ∂Ωr ∩ ∂Ωr+1 as well as the set of indices of nodes on it. The following additional nodal subvectors will be associated with each of the preceding index sets: ⎧ .k−1 ) for 1 ≤ i ≤ (l − 1). · · · . We will employ nodal subvectors T ui ≡ (ui.1 .132 3 Schur Complement and Iterative Substructuring Algorithms For convenience.

· · · . T . for 1 ≤ r ≤ p ⎪ I ⎪ u u Lr−1 +1 u Lr −1 ⎪ ⎪ . T ⎪ ⎪ (r) ≡ T .

· · · . for 1 ≤ r ≤ (p − 1) ⎪ ⎪ ⎪ ⎪ . u(p) T T I I ⎪ ⎪ (r) uE ≡ uLr . ⎪ ⎪ T ⎨ uI ≡ u(1) .

(3. (r) ⎣ .. Matrix AII takes the form: ⎡ (r) ⎤ M −γ (r) ⎡ (1) ⎤ ⎢ ⎥ AII 0 ⎢ −γ (r) M (r) −γ (r) ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ AII = ⎢ . (3.47) ⎢ . ⎥ M ≡(r) ⎢ . ⎥. where dr ≡ (Lr − Lr−1 − 1).. The submatrix γ (r) ≡ a1 hxx2 I is 1 of size (k − 1). .. .. ⎥... AIB and ABB based on the preceding index sets. . A = ⎢ . ⎥. . ⎪ ⎪ T ⎩ u ≡ u(1) . . ⎥ ⎣ −1 2 −1 ⎦ ⎣ . . ⎥ (3.. ⎦ −1 2 1 Matrix AIB will be block bidiagonal with p × (p − 1) blocks Xij = AI (i) E (j) : ⎡ ⎤ X11 0 ⎢X X ⎥ ⎢ 21 22 ⎥ ⎢ ⎥ ⎢ X32 X33 ⎥ ⎢ ⎥ AIB = ⎢ . . while M (r) of size (k − 1) satisfies: ⎡ ⎤ ⎡ ⎤ 2 −1 1 ⎢ ⎥ ⎢ . ⎥ ⎢ −1 2 −1 ⎥ ⎢ . . ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ X(p−1)(p−2) X(p−1)(p−1) ⎦ 0 Xp(p−1) .. ⎦ ..45) II ⎢ ⎥ ⎢ (r) ⎥ 0 (p) AII ⎣ −γ (r) M (r) −γ ⎦ −γ (r) M (r) (r) Here AII is a block tridiagonal and block Toeplitz matrix with dr × dr blocks (r) h of size (k − 1)..46) hx2 ⎢ ⎥ hx1 ⎢ ⎥ ⎢ ⎥ ⎢ . ⎥+ 1 ⎢ . u(p−1) T T .. ⎥ x2 ⎢ . . · · · . ⎥ (r) ⎢ a2 h x 1 ⎢ ⎥ 2a(r) h ⎢ ⎥ . . B E E The stiffness matrix A will be block partitioned into the submatrices AII ..

⎥ ⎣ 0 ⎦ ⎣ .. Matrix ABB will be a (p−1)×(p−1) block diagonal matrix whose individual blocks are each of size (k − 1) ⎡ (1) ⎤ AEE 0 ⎢ ⎥ . ⎥ and Es(s−1) = AI (s) E (s−1) = ⎢ . ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ Xrr = AI (r) E (r) = ⎢ . ⎦ −γ (r) 0 (3.3 FFT Based Direct Solvers 133 where for 2 ≤ r ≤ p and 1 ≤ s ≤ (p − 1) its block submatrices are defined by: ⎡ ⎤ ⎡ ⎤ 0 −γ (s) ⎢ . 3.48) with blocks of size (k −1)..

49) ⎣ . ⎥ where A(r) ≡ 1 M (r) + M (r+1) . (3. ABB = ⎢ . ⎦ EE 2 (p−1) 0 AEE Each submatrix M (r) is diagonalized by the sine ..

it is trivially diagonalized (r) h by Q with eigenvalues γ (r) j = a1 hxx2 for 1 ≤ j ≤ (k − 1). then since matrices ABB . respectively. .22. where: (r)   (r) (r) a hx1 jπ 2a hx2 λj =2 2 1 − cos( ) + 1 . . (3. . . λk−1 . ⎥ ⎢ ⎥ ⎣ SET (p−3) E (p−2) S S E (p−2) E (p−2) E (p−2) E (p−1) ⎦ T 0 SE (p−2) E (p−1) SE (p−1) E (p−1) (3.50) hx2 k hx1 Since matrix γ (r) is a scalar multiple of the identity.45). . transform matrix Q defined (r) (r) T (r) (r) (r) earlier. .47) and (3. for 1 ≤ j ≤ (k − 1). (3. .49). . E (p−1) the following will hold. . Given the ordering of nodes on B based on the index sets E (1) .47) and (3. Lemma 3. The resulting structure is summarized in the following... The Schur complement matrix S will be block tridiagonal of the form: ⎡ ⎤ SE (1) E (1) SE (1) E (2) 0 ⎢ S T (1) (2) SE (2) E (2) SE (2) E (3) ⎥ ⎢ E E ⎥ ⎢ .45) for ABB . (3. AIB and AII . 1 We next consider the block structure of matrix S given the ordering of nodes on B. 1. in S = (ABB − ATIB A−1 II AIB ).49). it will follow that matrix S must be block tridiagonal. respectively. .. rectangular block bidiagonal and block diagonal. with M = QΛ Q and Λ = diag λ1 . If we substitute the block partitioned matrices (3.51) with block submatrices SE (i) E (j) of size (k − 1). . . ⎥ S=⎢ . AIB and AII are block diagonal. . Explicit expressions for the block submatrices SE (r) E (r) and SE (r+1) E (r) can be obtained by directly computing the block entries of S = (ABB − ATIB A−1 II AIB ) using (3.

the scalars α1 . ⎦. ⎥ ⎢ . j ≤ n. . . . ⎥ ⎢ . 2. ⎦ ⎣ . where each Dij is a diagonal matrix. ⎦ ⎣ . Lemma 3. . . .134 3 Schur Complement and Iterative Substructuring Algorithms 2. . . ..23. ⎥ = ⎢ . Then. . For 1 ≤ r ≤ (p − 2) the block submatrices SE (r+1) E (r) will satisfy: SE (r+1) E (r) = −ATI(r+1) E (r+1) A−1 A (r+1) E (r) .. Cn1 · · · Cnn wn gn where wi ∈ IRm and gi ∈ IRm for 1 ≤ i ≤ n. wTn denote the solution to the block partitioned system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ C11 · · · C1n w1 g1 ⎢ . ⎦ (Dn1 )tt · · · (Dnn )tt αn δn . T 3. Let wT1 . . . for scalars δi ∈ IR where qt ≡ (q1t . For 1 ≤ r ≤ (p − 1) the block submatrices SEr Er will satisfy: SE (r) E (r) = AE (r) E (r) − ATI(r) E (r) A−1 A (r) (r) I (r) I (r) I E −ATI(r+1) E (r) A−1 A (r+1) E (r) . I (r+1) I (r+1) I Proof.. . Let Q be an orthogonal matrix of size m which simultaneously diagonalizes all the block submatrices of C: QT Cij Q = Dij . I (r+1) I (r+1) I 3. 1. As outlined earlier. ⎥ = ⎢ . each wi = αi qt will be a scalar multiple of qt for some αi ∈ IR. j ≤ n. Suppose the following assumptions hold. Furthermore. AIB and ABB can be partitioned into blocks of size (k − 1). and to obtain analytical expressions for the eigenvalues of its blocks. each of which are diagonalizable by the discrete sine transform matrix Q.. ⎣ ⎦⎣ . ⎥ ⎢ . for 1 ≤ i. .   Since AII . . . ⎦ ⎣ . ⎥. ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ . qmt ) denotes the t’th column of Q. T 4. Let gi = δi qt . αn will solve the following linear system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ (D11 )tt · · · (D1n )tt α1 δ1 ⎢ ⎥⎢ . Let C denote a positive definite symmetric matrix of size m n partitioned into n × n blocks Cij of size m for 1 ≤ i. The following two results will be employed to show this. ⎥ ⎣ . ⎥ ⎢ . .. the block submatrices of S = (ABB − ATIB A−1 II AIB ) will also be diagonalizable by matrix Q.

. elimination of the common factors qt yields the linear system: ⎧ ⎪ ⎪ (D11 )tt α1 + · · · + (D1n )tt αn = δ1 ⎨ . 0) . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎣ . Consider the following Toeplitz tridiagonal linear system: ⎡ ⎤ ⎡ ⎤ ⎡˜ ⎤ α1 µ b a˜ 0 ⎢ ⎥ ⎢ 1⎥ ⎢ ˜ ⎥ ⎢ . ⎥⎢ . ⎥ ⎢ . . (3. . ⎥ ⎢ c˜ b a ˜ ⎥⎢ .53) 2a˜ 2a˜ Then.. . ⎥ ⎢ . .. . ⎦ c˜ ˜b αd µd ˜. ⎥ = ⎢ .3 FFT Based Direct Solvers 135 Proof. ⎥ ⎢ . then: ! ρd+1 ρi1 − ρd+1 ρi2 αi = 2 1 . ⎪ ⎩ Cn1 qt α1 + · · · + Cnn qt αn = qt δn . ⎥ ⎢ . −˜b + ˜b2 − 4 a ˜ c˜ −˜b − ˜b2 − 4 a˜ c˜ ρ1 ≡ and ρ2 ≡ . since (qTt qt ) = 1. . This result can be obtained by an application of Lemma 3. . ⎥ ⎢ .24. . ⎪ ⎩ (Dn1 )tt α1 + · · · + (Dnn )tt αn = δn . ⎪ . . ⎥ ⎢ . .. . ⎦ ⎣ . (3. 0. ⎦ ⎣ . . ˜b. ... By construction. . . αn ) = 0. . . for 1 ≤ i ≤ d.. ⎦ ⎣ . . ⎢ ⎥⎢ . ρ2 ∈ IR as follows: . ⎦ αn qt Cn1 · · · Cnn αn qt αn (Dn1 )tt · · · (Dnn )tt αn When C is symmetric and positive definite. Define ρ1 .   The next result describes the solution of a Toeplitz tridiagonal system. .. . .. ⎥ ⎢ . it will hold that: ⎡ ⎤T ⎡ ⎤⎡ ⎤ ⎡ ⎤T ⎡ ⎤⎡ ⎤ α1 qt C11 · · · C1n α1 qt α1 (D11 )tt · · · (D1n )tt α1 ⎢ . ⎥ ⎢ .. µd ) = (−˜ c. Alter- natively. ⎣ .. . ⎦⎣ . ⎦⎣ . 3.54) ρd+1 2 − ρd+1 1 . ⎥ ⎢ ⎥ = ⎢ ⎥. c˜ ∈ IR satisfies (˜b2 − 4˜ where a a c˜) > 0.. ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ . ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ..52) ⎪ . . . Lemma 3. . ⎥. (3. the following will hold: T T 1.. ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ c˜ ˜b a˜ ⎦ ⎢ . verifying that (3. If (µ1 . both terms in the above expression T will be positive for (α1 . ⎥ ⎢ .52) is nonsingular. Since qt is an eigenvector of each matrix Cij corresponding to eigenvalue (Dij )tt . ⎦ ⎣ .20. . ⎥ ⎢ . . substitute the ansatz wi = αi qt to obtain the linear system: ⎧ ⎪ ⎪ C11 qt α1 + · · · + C1n qt αn = qt δ1 ⎨ ...

µd ) = (0.55) ρd+1 1 − ρd+1 2 Proof. . If (µ1 . . Substitute the ansatz that αi = ρi for 0 ≤ i ≤ (d + 1) into the finite difference equations. . for 1 ≤ i ≤ d. 0. . . . then:   ρi1 − ρi2 αi = . (3. . −˜a) . This yields the following equations: .136 3 Schur Complement and Iterative Substructuring Algorithms T T 2. .

1. Furthermore.   The next result shows that each submatrix SE (r) E (s) of the Schur comple- ment matrix S is diagonalized by the discrete sine transform Q of size (k − 1). by employing Lemma 3.56) ⎪ ⎪ k ⎪ ⎪ a1 hx 2 . a The roots of the characteristic polynomial are given by (3. for 1 ≤ i ≤ d. The general discrete solution to the finite difference equations will be of the form: αi = γ1 ρi1 + γ2 ρi2 . ⎪ ⎪ ρ1 (r. for arbitrary γ1 and γ2 . t)2 − 1. γ (r) .55). Solving for γ1 and γ2 yields (3. ρ1 (r. ρ2 (r. Then. Solving for γ1 and γ2 yields (3.25. For 1 ≤ r ≤ (p − 1) the vector qt will be an eigenvector of matrix SE (r) E (r) corresponding to eigenvalue (Drr )tt : SE (r) E (r) qt = (Drr )tt qt . the following results will hold. t) be as defined below: ⎧ (r) (r) ⎪ ⎪ γ ≡ a1 (hx2 /hx1 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ λt (r) (r) ≡ 2a2 (hx1 /hx2 ) 1 − cos( tπ k ) + 2γ (r) ⎪ ⎨ a h2 (r) ω(r. (r) Lemma 3. define dr = (Lr − Lr−1 − 1). t) ≡ ω(r. ˜ ρ2 + ˜b ρ + c˜ ρi−1 = 0. t). t) + ω(r. ω(r. t) ≡ ω(r. To solve the first linear system.53) and they will be real and distinct provided (˜b2 − 4 a ˜ c˜) > 0. In addition. we impose the boundary condition α0 = 0 and αd+1 = 1. t) ≡ 2(r) x2 1 1 − cos( tπ ) +1 (3. a It can be solved simultaneously.23 and 3.24. we impose the bound- ary condition α0 = 1 and αd+1 = 0. t) − ω(r. . t) and ρ2 (r. for each i. Let λt .54). t)2 − 1 ⎪ ⎪ ⎪ ⎩ . we can obtain analytical expressions for the eigenvalues of SE (r) E (s) . provided ρ solves the characteristic equation: ˜ ρ2 + ˜b ρ + c˜ = 0. To solve the second linear system.

3.3 FFT Based Direct Solvers 137 where (Drr )tt is given by: ⎧ .

.

t)dr −ρ2 (r.t)dr ⎨ (Drr )tt = −γ (r) ρ1 (r.t)dr +1 + 1 λ (r) + λ (r+1) 2 2 t t . ⎪ ρ1 (r.t) dr +1 −ρ (r.

59) Matrix SE (r) E (r+1) will be diagonalized by the discrete sine transform Q: QT SE (r) E (r+1) Q = Dr.r+1 is a diagonal matrix of size (k − 1).t) dr+1 +1 −ρ (r+1. . .57) ⎪ ⎩ ρ1 (r+1.. ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −γ (r) λt −γ ⎦ ⎣ 0 ⎥ (r) (r) ⎥ ⎢ ⎣0⎦ ⎣ ⎦ (r) 1 −γ (r) λt γ (r) . ⎥ ⎢ ⎥ ⎢ (r) ⎢ . . t)d(r+1) +1 − ρ2 (r + 1. . ⎥ ⎢ t ⎥ ⎢. ⎥ ⎢ ⎥ ⎢. t)d(r+1) +1 (3. ρ1 (r + 1.t) dr+1 −ρ2 (r+1. ⎥ ⎢ . ⎥ ⎢ . we shall employ the following expression for SE (r) E (r) qt : SE (r) E (r) qt = −ATI(r) E (r) A−1 A (r) (r) qt + AE (r) E (r) qt I (r) I (r) I E −ATI(r+1) E (r) A−1 A (r+1) E (r) qt . For 1 ≤ r ≤ (p − 2) the vector qt will be an eigenvector of the matrix SE (r) E (r+1) corresponding to the eigenvalue (Dr. I (r+1) I (r+1) I Each of the submatrices in the above can be block partitioned into blocks that are diagonalized by Q. ⎥ ⎢ . (3.23 it will follow that qt is an eigen- vector of each of the three matrix terms above. We will determine the eigen- value associated with each term separately.58) where Drr is a diagonal matrix of size (k − 1). where Dr. ⎥ ⎢ ⎥ ⎢ ⎥ θ1 = −γ ⎢ . . where (Dr.r+1 )tt is given by:   ρ1 (r + 1. To verify that qt is an eigenvector of SE (r) E (r) . . .t)dr+1 +1 .r+1 )tt qt . By Lemma 3. t) (Dr. (3. 2. ⎥.r+1 .23 will yield the following expression for θ1 : ⎡ ⎤T ⎡ (r) ⎤−1 ⎡ ⎤ 0 λt −γ (r) 0 ⎢. t) − ρ2 (r + 1. Let θ1 denote the eigenvalue of − ATI(r) E (r) A−1 A (r) I (r) I (r) I E (r) associated with eigenvector qt : −ATI(r) E (r) A−1 A (r) (r) qt = θ1 qt .r+1 )tt : SE (r) E (r+1) qt = (Dr. I (r) I (r) I E An application of Lemma 3. 1 2 Matrix SE (r) E (r) will be diagonalized by the discrete sine transform Q: QT SE (r) E (r) Q = Drr . . Proof.r+1 )tt = −γ (r+1) . ⎥ ⎢. . ⎥ ⎢ −γ (r) λ(r) −γ (r) ⎥ ⎢.t) dr+1 −γ (r+1) ρ (r+1. ⎥ ⎢ .

t)dr +1 − ρ2 (r. t)dr − ρ2 (r.138 3 Schur Complement and Iterative Substructuring Algorithms The right hand side above can be evaluated as −γ (r) αdr in Lemma 3. t)dr +1 The eigenvalue θ2 of AE (r) E (r) corresponding to eigenvector qt was derived earlier in this subsection as: 1 . This yields: the choice a   ρ1 (r. ˜b = λt and d = dr = (Lr − Lr−1 − 1). ρ1 (r.24 for (r) ˜ = c˜ = −γ (r) . t)dr θ1 = −γ (r) .

we may employ Lemma 3. using matrices Gii of size (p − 1): (Gii )r. t)dr+1 θ3 = −γ (r+1) . We summarize the algorithm next. t)dr+1 − ρ2 (r + 1. which verifies (3. t)d(r+1) +1 − ρ2 (r + 1. matrix Q will diagonalize SE (r+1) E (r) = QDr+1.57).s )ii for 1 ≤ r. ρ1 (r + 1. t) (Dr+1.23 and 3. To obtain an expression for the eigenvalue (Dr+1.24. s ≤ (p − 1). It results in the expression:   ρ1 (r + 1. Q diagonalizes SE (r) E (r) = QDrr QT . 2 The eigenvalue θ3 of − ATI(r+1) E (r) A−1 A (r+1) E (r) corresponding to I (r+1) I (r+1) I eigenvector qt can be determined as for θ1 using Lemma 3. t) − ρ2 (r + 1.   The preceding result shows that the block submatrices of matrix S are simultaneously diagonalized by the discrete sine transform Q.57) or (3. This yields:   ρ1 (r + 1.r QT .24. By construction.s = (Dr. where (Dr. (r) (r+1) θ2 = λt + λt . ρ1 (r + 1.59).23 and 3.r )tt of SE (r+1) E (r) we evaluate − ATI(r+1) E (r+1) A−1 A (r+1) E (r) qt at the eigenvector qt .20 to construct a fast direct solver for S. Thus.r )tt = −γ (r+1) .s )ii is defined by (3. t)dr+1 +1 Combining the three terms yields an expression for the eigenvalue (Drr )tt of SE (r) E (r) corresponding to eigenvector qt : (Drr )tt = θ1 + θ2 + θ3 . . 1 ≤ i ≤ (k − 1). t)d(r+1) +1 By construction. us- I (r+1) I (r+1) I ing Lemma 3. t)dr+1 +1 − ρ2 (r + 1. Matrix Gii will be tridiagonal.

3. f TE (p−1) 1.3 FFT Based Direct Solvers 139 Algorithm 3. For i = 1. . .3. . k − 1 do . . .2 (FFT Based Solution of SuB = f B ) T T Let uB = uTE (1) . . . p − 1 do 4. . . p − 1 in parallel do: Compute f E (i) ≡ QT f E (i) ˜ 2. uTE (p−1) and f B = f TE (1) . . . Endfor 3. For j = 1. . . . . . . . . . For i = 1.

. uTE (p−1) . RE]. k − 1 do: Define cE (i) )j = (cj )i (˜ 11. CH14. . Endfor 13. . . Remark 3. . Endfor T Output: uB = uTE (1) . . In the case of a two strip decomposition. the preceding algorithm will have a complexity of O (p k log(k)). . Endfor 12. Endfor 6. CH13. 14. The loop between lines 7 and 8 requires the solution of a total of (k − 1) tridiagonal linear systems.27. the Schur complement matrix S = SE (1) E (1) will be diagonalized by the discrete sine transform Q: S = QD11 QT . As a result. . The loop between lines 1 and 2 requires the application of a total of (p − 1) fast sine transforms. . . . For i = 1. . For j = 1. . For i = 1. Define (gj )i ≡ ˜f E (i) j 5. .26. . Endfor 7. Remark 3. . . . each involving (p − 1) unknowns. p − 1 do: 10. p − 1 do: Compute uE (i) = Q˜ cE (i) . . Such eigendecompositions can be employed to precondition a two subdomain Schur complement matrix arising in two dimensional elliptic problems and will be considered in the next section [BJ9. . Endfor 9. . For j = 1. k − 1 in parallel solve (using a tridiagonal solver): Gjj cj = gj 8. The loop between lines 13 and 14 requires the application of a total of (p − 1) fast sine transforms.

We omit additional details. the block matrix techniques that were described can also be applied to discretizations of separable elliptic equations on three dimensional rectangular domains with strip subdomains. 3. and order the nodes in Ω based on Ω1 . In this case. In such cases. the nodal vectors uj correspond to nodal unknowns on planar cross sections of Ω. where the geometry of the interface B is relatively simple.28. see Fig. 3. provided. Given this ordering. Matrix Q will then be a two dimensional FFT or FST ma- trix. a nodal vector u can be partitioned as . the stiffness matrix A and the Schur complement matrix S will have block tridiagonal structure. We shall describe preconditioners based either on local Schur complement matrices or on approximations of S which use properties of the Steklov-Poincar´e map associated with S. and the algebraic expressions derived in this section for eigenvalues of (r) the Schur complement blocks will remain valid provided λj and γ (r) cor- respond to eigenvalues of block matrices M (r) and γ (r) . its entries will decay in magnitude with increasing distance between the nodes.3. We consider a finite element discretization of elliptic equation (3.1) on a domain Ω. In this section. respectively. S will be dense. however.4 Two Subdomain Preconditioners Our study of preconditioners for the Schur complement S begins with the two subdomain case. we have focused solely on FFT based Schur com- plement solvers for discretizations of elliptic equations on two dimensional do- mains. We assume that Ω is partitioned into two nonoverlapping subdomains Ω1 and Ω2 with interface B ≡ ∂Ω1 ∩ ∂Ω2 .140 3 Schur Complement and Iterative Substructuring Algorithms Remark 3. in the three dimensional case. Ω2 and B. with Dirichlet boundary conditions on BD = ∂Ω. However.

T T (1) (2) AIB AIB ABB uB fB Regular decomposition Immersed decomposition Ω2 B Ω2 B Ω1 Ω1 Fig.1): ⎡ ⎤⎡ (1) (1) (1) ⎤ ⎡ (1) ⎤ AII 0 AIB uI fI ⎢ ⎥⎢ ⎢0 (2) ⎥ ⎢ (2) ⎥ AII AIB ⎥ (2) (2) ⎣ ⎦ ⎣ uI ⎦ = ⎣ f I ⎦. Two subdomain decompositions . 3. uTB . and the discretization of (3.1) will be (see Chap. uI . T (1)T (2)T u = uI .3. 3.

2 and substituting this into the third block row above. 3.4 Two Subdomain Preconditioners 141 The Schur complement matrix S associated with the above system can be (i) (i)−1 (i) (i) derived by solving uI = AII (f I − AIB uB ) for i = 1. This will yield the reduced system: .

• Preconditioner based on algebraic approximations of S. S will be dense. (1)T (1)−1 (1) (2)T (2)−1 (2) S uB = f B − AIB AII f I − AIB AII f I . 2. however. where S (i) = ABB − AIB AII AIB . S) ≡ .1 Preconditioners Based on Subdomain Schur Complements The use of the local Schur complement S (i) to precondition S can be motivated by the matrix splitting of S by the subassembly identity (3. or as the coefficient a(x) and the subdomain size h0 varies. In this section. (1)T (1)−1 (1) (2)T (2)−1 (2) where S ≡ (ABB − AIB AII AIB − AIB AII AIB ) is the two subdomain Schur complement. Of these. We shall seek preconditioners M for S such that: λmax M −1 S cond(M. its action can be computed without its assembly. without deterioration as h → 0+ . we shall describe three categories of Schur complement preconditioners for two subdomain decompositions: • Preconditioners based on subdomain Schur complements.20): (i) (i)T (i)−1 (i) S = S (1) + S (2) . (i) since B = B (1) = B (2) and RB = I for i = 1. 3. the preconditioners based on subdomain Schur complements are more easily generalized to the many subdomain case and higher dimensions. • Preconditioners based on FFT’s and fractional Sobolev norms.4. into the algebraic −1 expression S = (ABB − AIB AII AIB ) for the Schur complement matrix: T ⎧ . This splitting may also be (1) (2) derived by substituting the identity ABB = ABB + ABB . λmin (M −1 S) is significantly smaller than the condition number of S.

(1)T (1)−1 (1) (2)T (2)−1 (2) ⎨ S = A(1) (2) BB + ABB − AIB AII AIB − AIB AII AIB .

.

⎩ = A(1) − A(1)T A(1)−1 A(1) + A(2) − A(2)T A(2)−1 A(2) = S (1) + S (2) . except when c(x) = 0 and Ωi is immersed in Ω. even if it were (i) (i) assembled). in which case S (i) will be singular. 3. Matrix S (i) need not be assembled (and it will be dense. each S (i) will be symmetric and positive definite. however. For simplicity. . It will be important to solve the system S (i) vB = rB efficiently.7). BB IB II IB BB IB II IB Typically. we shall assume S (i) is nonsingular (see Chap.

and by selecting vB . we list a discretization of the Steklov-Poincar´e formulation (1. I AIB ABB I The subdomain stiffness matrix here corresponds to the discretization of an elliptic equation on Ωi with Neumann boundary data on B (i) .19) from Chap. Its name arises. the cost of preconditioning with S (i) is typically less than half the cost of solving Av = r. and derive the discrete version of the Dirichlet-Neumann al- (1) (1) (2) (2) gorithm.9 that cond(S (i) . It suggests that (i) (i) (i) the solution to S (i) vB = rB can be obtained by solving (3. In the special case where the elliptic equation and the grid is symmetric about B. When the number of unknowns on each subdomain is approximately half the total number of unknowns. 1. Iteratively replace the transmission boundary conditions on B using a relaxation parameter 0 < θ < 1. Below. since a Neumann problem must be solved on Ωi and a subsequent Dirichlet problem on its complemen- tary domain. S (i) is traditionally referred to as the Dirichlet-Neumann preconditioner for S. a discretiza- tion of the Steklov-Poincar´e formulation (1.142 3 Schur Complement and Iterative Substructuring Algorithms Fortunately. S) ≤ c. respectively. (k) To obtain a discrete version of the Dirichlet-Neumann algorithm. - −1 (i) 0 AII AIB 0 (i) S (i) rB = (i)T (i) rB . FU. and uI and uB the k’th iterate on Ω2 and B (2) . wB )T and (wI . Let (wI .60) using rB to (i) (i) replace S (i) vB in the right hand side.1. such a system can be solved without assembling S (i) .3. the number of iterations required depends on the effectiveness of this preconditioner. . -T  (i) (i) −1 .19) will yield: ⎧ ⎪ ⎪ (1) (1) (1) (1) AII wI + AIB wB = f I (1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (1) wB = wB (2) ⎪ ⎪ (2) (2) (2) (2) (2) ⎪ ⎪ AII wI + AIB wB = f I ⎪ ⎪ ⎪ ⎩ (2)T (2) (2) (2) (1)T (1) (1) (1) AIB wI + ABB wB = −AIB wI − ABB wB + f B . MA29]. it will hold that S (1) = S (2) and cond(S (i) . respectively. S) = 1.60) (i) AIB ABB (i) vB S (i) vB (i) This identity can be verified by block elimination of vI . see [BJ9. Formally: . - (i) (i) (i) AII AIB vI 0 T (i) = (i) . Then. by using the following algebraic property satisfied by S (i) :    .1 from Chap. (3. respectively. However. BR11. 1. in the Dirichlet-Neumann algorithm (Alg.3). It is shown in Chap. let vI and (k) (k) (k) vB denote the k’th iterate on Ω1 and B (1) . for c > 0 independent of h. 3. wB )T denote nodal vectors associated with finite element functions on Ω 1 and Ω 2 .3.

uB )T denote starting iterates. Endfor . IB I BB 2 B IB I BB B 4.1 (Dirichlet-Neumann Algorithm) (0) (0) (0) (0) Let (vI .4 Two Subdomain Preconditioners 143 Algorithm 3. 1. 1.4. 3. Solve the Dirichlet problem: ⎧ ⎨ A(1) v(k+1) + A(1) v(k+1) = f (1) II I IB B I ⎩ (k+1) vB (k) = θ uB + (1 − θ) vB (k) 3. vB )T and (uI . For k = 0. Solve the mixed problem: ⎧ ⎨ A(2) (k+1) II uI (2) (k+1) + AIB uB (2) = fI ⎩ A(2)T u(k+1) + A(2) u(k+1) = f − A(1)T v(k+1) − A(1) v(k+1) . · · · until convergence do: 2.

T .

uI . A matrix (k+1) form for this can be derived by solving for vI in step 2: ⎧ −1 . uB (k+1) (k+1) If the interior variables vI and uI are eliminated in the preceding (k+2) (k+1) algorithm. T (k)T (k)T (k)T (k)T Output: vI . vB . we may obtain an expression relating vB to vB .

Solving the resulting block (k+1) (k+1) system using block elimination (representing uI in terms of uB ) yields: ⎧ . B B B and substituting this into the equations in step 3. ⎨ v(k+1) = A(1) (1) (1) (k+1) f I − AIB vB I II ⎩ v(k+1) = θu(k) + (1 − θ) v(k) .

(1)T (1)−1 ⎪ ⎪ (k+1) S (2) uB = f B − AIB AII (1) (1) (k+1) f I − AIB vB ⎪ ⎨ (1) (k+1) (2)T (2)−1 (2) ⎪ −ABB vB − AIB AII f I ⎪ ⎪ ⎩ (1)T (1)−1 (1) (2)T (2)−1 (2) (k+1) = f B − AIB AII f I − AIB AII f I − S (1) vB . this reduces to: −1 . T −1 T −1 (1) (1) (1) (2) (2) (2) Defining ˜f B ≡ f B − AIB AII f I − AIB AII f I .

. we summarize the action (i) of the Dirichlet-Neumann preconditioner M = S (i) on a vector rB . uB = S (2) B (k+2) (k+2) (k+1) (k+1) Since vB is defined as vB = θ uB + (1 − θ) vB . this shows that the preceding Dirichlet-Neumann algorithm corresponds to an unaccelerated Richardson iteration to solve the Schur complement system S uB = ˜f B with M = S (2) as a preconditioner and θ as a relaxation parameter. Below. We may also employ M = S (1) as a preconditioner for S. (k+1) ˜f B − S (1) v(k+1) .

2 (Dirichlet-Neumann Preconditioner) Input: rB Solve:    . and grid on the two subdomains differ significantly. with Neumann boundary conditions on B. Both issues are addressed by the balancing procedure in Chap. matrices S (1) and S (2) can also differ significantly. the solution will be unique only up to a multiple of 1. and in parallel. the local stiffness matrix A(i) and its Schur complement S (i) will be singular. When the geometry. Furthermore. when c(x) = 0 and B (i) = ∂Ωi . it may be more equitable to combine information from both the subdomains in the preconditioner. - (i) (i) (i) AII AIB vI 0 (i)T (i) (i) = . coefficients. .3 (Neumann-Neumann Preconditioner) Input: rB and 0 < α < 1 . In this case. As mentioned earlier. 2. Remark 3. 1) . A A vB rB IB BB Output: M −1 rB ≡ (i) vB .29. When applying the Dirichlet-Neumann preconditioner. Algorithm 3. The action of the inverse of this preconditioner is summarized below.7. where 0 < α < 1 is a scalar parameter for assigning different weights to each subdomain (though. MA17]. 3. see [MA14. . with T the null space of S (i) spanned by 1 = (1.144 3 Schur Complement and Iterative Substructuring Algorithms Algorithm 3. . hence.4. since S (i) vB = rB will be solvable T only if the compatability condition 1 rB = 0 is satisfied. the name Neumann-Neumann preconditioner [BO7]. a specific Schur complement matrix S (i) must be chosen. typically α = 12 ). The action of the inverse of the two subdomain Neumann-Neumann preconditioner M is defined as: −1 −1 M −1 ≡ α S (1) + (1 − α) S (2) . for i = 1. Computing the action of M −1 requires the solution of a discretized elliptic equation on each subdomain.4. In this case. . the Dirichlet- (i) Neumann preconditioner must be modified. This motivates the Neumann-Neumann preconditioner.

wB :    . Endfor Output: M −1 rB ≡ α wB + (1 − α) wB . - (i) (i) (i) AII AIB wI 0 (i)T (i) (i) = . For i = 1. T (i) (i) 1. 2 in parallel solve for wI . AIB ABB wB rB 2. (1) (2) .

3. and convergence rates independent of h.30. By construction. If M is a matrix that generates the latter fractional Sobolev norm energy. whose interface B ˜ has the same ˜ number of unknowns as on B.1) posed on Ω may be heuristically approximated by an elliptic equation posed on Ω ˆ with (possibly modified) coefficients a ˆ(x) and cˆ(x) approximating a(x) and c(x). such methodology is primarily applicable in two dimensions. Ω2 and B. 3. Proof. Theoretical analysis in Chap. respectively: ∇ · (ˆ u) + cˆ(x) = fˆ(ˆ a(x)∇ˆ ˆ∈Ω x). when Ω ⊂ IR2 .4. for x ˆ ˆ ∈ ∂ Ω. FU. The advantage of FFT based preconditioners is that when they are applicable. In the fractional Sobolev norm approach. then it will provide a heuristic FFT based preconditioner for S.e.9. Ω ˆ and interface B that approximate Ω1 . It also applies to subdomains with arbitrary geometry in two or three dimensions.2 Preconditioners Based on FFT’s and Fractional Norms Preconditioners for S. the Schur complement S on an interface B is approximated by the Schur complement S˜ of a model problem on another domain. with subdomains Ω ˆ2 ˆ1 . while the Neumann-Neumann preconditioner requires two subdomain solves. If the Schur complement S in the model problem has FFT solvers. can be motivated in alternate ways. then the condition number cond(M. S) will be bounded independent of the mesh size h. Model Problem Based Preconditioners. for applicability. . or have a multilevel structure. In a model problem based approach.4 Two Subdomain Preconditioners 145 An advantage of the Neumann-Neumann preconditioner is that each local problem is typically easy to set up. an equivalence between the energy of the Schur complement S and a fractional Sobolev norm energy of its boundary data on B is employed. with interface B.9 in- dicates that the Dirichlet-Neumann and Neumann-Neumann preconditioners typically yield convergence rates which are independent of the mesh size h. respectively. If M denotes the Dirichlet-Neumann or Neumann-Neumann preconditioner for a two subdomain decomposition. for x ˆ (3. the grid on B must either be uniform. and its algebraic form extends easily to multisubdomain decompositions.. See [BJ9. the elliptic equation (3. it can be employed to precondition the Schur complement S. BR11. MA29] and Chap. Then. based on FFT’s and fractional Sobolev norms.61) u ˆ = 0. However. i. they yield almost optimal order complexity. Given a domain Ω with subdomains Ω1 and Ω2 . the preceding Dirichlet-Neumann preconditioner requires only one subdomain solve. Theorem 3.   3. In three dimensions. let Ωˆ be a region approximating Ω. The model problem approach is heuristic in nature. 3.

ˆ Block partitioning the unknowns in Ω ˆ based on the subregions yields the system:  . . with Sˆ = QDQT for a diagonal matrix D.31 for a two dimensional domain Ω. If Ω ˆ is a small subregion of Ω satisfying B ⊂ Ω ˆ ⊂ Ω. we list different choices of diagonal entries Dii for 1 ≤ i ≤ k.61) on this grid. see [DR. ˆ The grid on Ω will be chosen to be uniform. as follows. and the coefficients a ˆ(x) and cˆ(x) will be chosen to be constant in each subdomain. CH2. and system (3. we may choose aˆ(·) = a(·) and cˆ(·) = c(·). MA37. see [NE3.32. we may substitute ˆ f I = 0. Ω2 = Ω2 ∩ Ω and B = B. for different choices of diagonal matrices D. Let Th (Ω)ˆ denote a ˆ triangulation of Ω having the same number of interior nodes in B ˆ as in B. Remark 3.ˆ Ω ˆ1 and Ω ˆ2 to be rectangular regions. Heuristically.62) AˆIB AˆBB T u ˆB fB The Schur complement matrix Sˆ = (AˆBB − AˆTIB Aˆ−1 ˆ II ABB ) in the model prob- lem may then be employed as a preconditioner for S.57) and (3. ˆ let Sˆ denote the Schur complement associated with the discretized model problem on Ω.146 3 Schur Complement and Iterative Substructuring Algorithms A preconditioner Sˆ can be constructed for S. Furthermore. ˆ is obtained by mapping Ω and if B maps into B. we elaborate on the preconditioner outlined in Remark 3. then we ˆ ˆ ˆ ˆ ˆ may define Ω1 = Ω1 ∩ Ω.63) We employ a model Schur complement preconditioner Sˆ = QDQT for S. 1 ≤ i. j ≤ k.58). GO3. In this case. when the triangulation of Ω ˆ restricted to B ˆ has the same connectivity as the original triangulation restricted to B.61) is separable. BR11]. the coefficients a ˆ1 and Ω ˆ(x) and cˆ(x) may be chosen so that (3. To construct a preconditioner Sˆ for S. (3.62) and define the action of the inverse of a preconditioner as Sˆ−1 rB ≡ u ˆ B . and consider a discretization of (3. ˆ Remark 3. in two dimensions.ˆ then a FFT solver can be constructed for S. We shall assume that the interface B can be mapped onto a line segment B.. Ωˆ2 and interface B. Qij ≡ 2/(k + 1) sin (i j π/(k + 1)) . SM]. Given the subdomains Ω ˆ1 . If Ω ˆ we may seek Ω ˆ2 that are rectangular.62) will have a coefficient matrix which is a small submatrix of A. If k denotes the number of unknowns on B (and hence on B). BJ9. - AˆII AˆIB u ˆI ˆ f = ˆI . see (3. (3. yielding different preconditioners: . In this case. Next.ˆ and choose Ω. matrix Sˆ should be approximately spectrally equivalent to S.31. ˆ we define the discrete sine transform matrix Q of size k as: . ˆ f B = rB into (3. Next. matrix Sˆ may be explicitly diagonalized by a discrete sine transform matrix Q. If a uniform triangulation is employed on Ω.

for 1 ≤ i ≤ k ⎪ ⎪ 2(k* + 1) ⎪ ⎨ 1 1 1 + σi − σi + σi2 (3.4 Two Subdomain Preconditioners 147 ⎧ (1) 1/2 ⎪ ⎪ Dii = a + a (2) (σi ) . [BJ9. Then.65) ⎪ ⎪ 2 4 ⎪ ⎪ γ i ≡ * . so the user may need to rescale the output. for 1 ≤ i ≤ k. where Q is the discrete sine transform. the combined cost for solving the linear system Sˆ wB = rB will be O(k log(k)).4 (FST Based Fractional Norm Preconditioner) Input: rB . ii i i 2 (3. ⎪ [BR11. CH2] ⎪ ⎪ 1 − γ 1 − γ m2 +1 4 ⎪ ⎪ i i ⎩ D = 1 a(1) + a(2) σ − σ 2 /6 1/2 . so that the transform applied twice should yield the identity. CH2] can be formally obtained as follows for a two dimensional domain Ω ˆ with k interior nodes on B. CH2] are integers chosen so that (mi + 1) h and (k + 1)h represents the approximate length and width of subdomain Ωi . The resulting preconditioner Sˆ = QDQT for S. ⎪ ⎪ 1 1 ⎩ 1 + σi + σi + σi2 2 4 The scalars a(1) and a(2) denote values of a(x) at some interior point in Ω1 and Ω2 . Evaluate using the fast sine transform: wB = Q xB Output: Sˆ−1 rB ≡ wB .64) Here.33. Let (mi + 1) h the approximate length of subdomain Ω ˆi and let the (i) ˆ constant in Ω coefficients of the model problem be isotropic with a ˆi . Algorithm 3. a(1) and a(2) will be eigenvalues of a(x) at chosen interior points in Ω1 and Ω2 . However. the parameters σi and γi are defined by: ⎧   ⎪ iπ ⎪ ⎪ σi ≡ 4 sin 2 . is summarized next. D 1. Compute in linear complexity: xB = D−1 yB 3. it will hold that Q−1 = Q. [GO3]    1/2 ⎪ 1 + γim1 +1 (2) 1 + γi m2 +1 1 ⎪ ⎪ Dii = a(1) m1 +1 + a σi + σi2 . BJ9]. respectively. 3. Since the cost of applying a discrete sine transform is typically O(k log(k)). When a(x) is a matrix function. [DR] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (1) (2) σi + σi2 /4 1/2 ⎨ Dii = a + a . when a(x) is a scalar function. The choice of diagonal matrix D in [BJ9. The parameters m1 and m2 in the preconditioner of [BJ9.4. . Remark 3. in FFT packages [VA4] the discrete sine transform may be scaled differently. Since the discrete sine transform matrix Q is symmetric and orthogonal. Evaluate using the fast sine transform: yB = Q rB 2.

When the parameters m1 and m2 are large above. the eigenvalues Dii of the Schur complement matrix Sˆ in the model problem simplifies to:  m1 +1 m2 +1   1/2 (1) 1 + γi (2) 1 + γi 1 2 Dii = a +a σi + σi . 1 − γim1 +1 1 − γim2 +1 4 for 1 ≤ i ≤ k.148 3 Schur Complement and Iterative Substructuring Algorithms for hx1 = hx2 = h.57).65). This follows by algebraic simplification of (3. the expression for the eigenvalues Dii can be approximated as: . where σi and γi are as defined in (3.

both preconditioners will be spectrally equivalent 1/2 1/2 1/2 1/2 √ since σi + σi2 /4 = (σi ) (1 + σi /4) and 1 < (1 + σi /4) < 2 for 0 < σi < 4.∂Ωi can be shown to be equivalent to uh H 1/2 (B) when uh is zero on ∂Ωi \B. By construction. BE16] as outlined below. 00 A matrix M satisfying this property can be constructed by employing the theory of Hilbert interpolation spaces [BA3. the norm equivalences (3. between the energy as- sociated with a two subdomain Schur complement S and a fractional Sobolev norm energy.1. Similar preconditioners can be constructed in three dimensions provided the grid on the interface B can be mapped into a two dimensional rectangular grid. LI4. 1/2 Dii → a(1) + a(2) σi + σi2 /4 . Fractional Sobolev Norm Based Preconditioners. see [BR17] and Chap.28) reduce to: c uh 2H 1/2 (B) ≤ uTB SuB ≤ C uh 2H 1/2 (B) . see (3.28) and (3. 00 see [LI4]. Given two Hilbert spaces satisfying H0 ⊃ H1 where the latter space has a stronger norm u H0 ≤ C u H1 for all u ∈ H1 . Matrix Q will then be a two dimensional fast sine trans- form. Such norm equivalences hold for harmonic and discrete harmonic functions. The preconditioner of [DR] can be formally obtained from the 1/2 preconditioner of [GO3] by replacing the terms σi + σi2 /4 by the terms 1/2 (σi ) .66) 00 00 for 0 < c < C independent of h. and is proved using elliptic regularity theory. When the grid on B is not rectangular.34. preconditioners approximating fractional Sobolev norms can be constructed using multilevel methodology. This norm equivalence suggests that a preconditioner M can be constructed for S. provided the grid has a multilevel structure.32) and (3. since 0 < γi < 1. Remark 3. For two subdomain decompositions. This approach is mo- tivated by a norm equivalence. (3. since the fractional Sobolev norm uh 1/2. This heuristically motivates the precondi- tioner of [GO3]. 7. for 1 ≤ i ≤ k.32). a family of interpolation . by representing the discrete fractional Sobolev energy as: uh 2H 1/2 (B) = uTB M uB .

u)0 ≤ C (u. 0 ≤ α ≤ 1. BE16]. LI4. • Let H0 denote an Hilbert space with inner product (. u)1 . the fractional index Sobolev space H00 (B) is often constructed as an interpolation space H 1/2 obtained by interpolating 1/2 H0 = L2 (B) and H1 = H01 (B). so that H0 = H0 and H1 = H1 : Hα ≡ {u ∈ H0 : (T α u. . which corresponds to a Riesz representation map. v)0 = i=1 λα i (Pi u. 1/2 In elliptic regularity theory.)0 and let H1 ⊂ H0 denote a subspace with a stronger inner product (.)0 orthogonal projections onto the eigenspace of T corresponding to eigenvalue λi . Then. with associated inner products defined as outlined below [BA3.. • Let T denote a self adjoint coercive operator satisfying: (T u. i=1 Then.. u)0 < ∞} .. This procedure defines interpolation spaces Hα satisfying H1 ⊂ Hα ⊂ H0 . v)0 . v)1 . for 0 ≤ α ≤ 1 we may formally define a fractional operator T α as: ∞  Tα ≡ λα i Pi .4 Two Subdomain Preconditioners 149 spaces Hα can be constructed for 0 ≤ α ≤ 1 with H0 = H0 and H1 = H1 . v ∈ H1 . . i=1 where 0 < λ1 < λ2 < · · · are eigenvalues of T and Pi are (. )1 : (u. 3. v)α ≡ (T α u. v)0 = (u. • Let T have the following spectral decomposition: ∞  T = λi Pi . ∀u. where the inner product on Hα is consistently defined by: ∞ (u. . ∀u ∈ H1 . for each 0 ≤ α ≤ 1 the interpolation space Hα is formally defined as the domain of the fractional operator T α . The space H00 (B) will correspond to the 1 domain of the operator T 2 with associated fractional norm defined by: .

H00 (B) 0 The operator T corresponds to a Laplace-Beltrami operator −∆B defined on B with homogeneous boundary conditions on ∂B. u = i=1 λi2 (Pi u. and are examples of pseudodifferential operators. Formally. These fractional operators T α . will not remain differential operators for 0 < α < 1. u)0 . To obtain a matrix representation of uh 2 1/2 on the finite element space Vh (B) of finite element functions H00 (B) . we may employ fractional powers of ma- trices to represent fractional operators. however. ∀u ∈ H00 (B). the fractional powers of T may be computed by employing the eigenfunction expansion of T and replacing the eigenvalues of T by their fractional powers. In the finite dimensional case. 1 ∞ 1 1/2 u 2 1/2 ≡ T 2 u.

uh ) = uTB Ah uB . uh ) = uTB Gh uB . where uB denotes the nodal vector corresponding to the finite element func- tion uh (x) restricted to B. for α = 1 (Thα uh . Formally. Let Gh denote the mass (Gram) matrix associated with the finite element space Vh ∩ H01 (B) with standard nodal basis. and let Ah denote the finite element discretization of the Laplace- Beltrami operator with trivial boundary conditions imposed on ∂B. by construction it will hold that: (Thα uh . for α = 0. we seek a symmetric positive definite matrix Th satisfying: 0 Th uh . for uh ∈ Vh (B) ∩ H01 (B). a matrix representation of fractional operators associated with Th may be constructed as: . uh = uh 2L2 (B) . for uh ∈ Vh (B) ∩ L2 (B) 1 Th uh .150 3 Schur Complement and Iterative Substructuring Algorithms restricted to B. Then. uh = uh 2H 1 (B) . ·) denotes the L2 (B) inner product. 0 where (·.

1 −1 −1 α 1 Thα = Gh2 Gh 2 Ah Gh 2 Gh2 . for 1 ≤ α ≤ 1. This yields: 1 1 .

⎥.. then Th2 can be efficiently computed and its associated linear system can be solved efficiently. with zero boundary conditions on ∂B is: d2 u −∆B u(x) ≡ − . . . Then. for u(0) = 0 and u(1) = 0.. and nodal vector (uB )i = uh (ih) corresponds to the finite element function uh . 1 1 −1 −1 2 Th2 = Gh2 Gh 2 Ah Gh 2 Gh2 . . dx2 If the grid size is h = 1/(k + 1). . . . ⎥ and Gh = ⎢ . . 1). . . . h ⎢ ⎥ 6 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ −1 2 −1 ⎦ ⎣ 1 4 1⎦ −1 2 14 .. then the finite element discretization Ah of the Laplace-Beltrami operator and the Gram matrix Gh will be of size k: ⎡ ⎤ ⎡ ⎤ 2 −1 4 1 ⎢ ⎥ ⎢ ⎥ ⎢ −1 2 −1 ⎥ ⎢1 4 1 ⎥ 1 ⎢⎢ ⎥ ⎢ ⎥ . ⎥ h ⎢ ⎥ Ah = ⎢ . 1/2 To construct an explicit representation of matrix Th we assume that the interface B corresponds to the line segment (0. . . When matrices Ah and Gh can be simultaneously diagonalized by the discrete 1 sine transform Q. the Laplace-Beltrami operator −∆B defined on B.

⎩ λj (Gh ) = 3(k + 1) 2(k + 1) 1 1 The fractional power Th2 can be represented explicitly as Th2 = QDQT where: ⎧ 1 1 ⎪ ⎪ ⎪ ⎪ 2 2 ⎪ ⎪ D jj = λ j (G h ) λ j (A h ) ⎪ ⎪ ⎪ ⎪ ⎨    1  1 1 j π 2 j π 2 ⎪ = 3 − 2 sin2 ( ) 4 (k + 1) sin2 ( ) ⎪ ⎪ 3(k + 1) 2(k + 1) 2(k + 1) ⎪ ⎪ ⎪ ⎪ ⎪  1  1 ⎪ ⎪ 2 jπ jπ 2 1 2 2 ⎩ = √ 3 sin (2 ) − 2 sin (4 ) = σj − σj 3 2(k + 1) 2(k + 1) 6 for σi = 4 sin2 ( 2(k+1) iπ ) where 1 ≤ j ≤ k. j ≤ k.64). however.4 Two Subdomain Preconditioners 151 Matrices Ah and Gh can be simultaneously diagonalized by the one dimen- sional discrete sine transform matrix Q with entries: .4. the Schur complements S may be formulated and applied heuristically. 3. Analogous FFT based preconditioners can be constructed for two subdomain Schur complements in three dimensions. 3. BJ9] in (3. coercive self adjoint elliptic operator.3 Preconditioners Based on Algebraic Approximation of S The Schur complement matrix S arising in a two subdomain decomposition is typically a dense matrix. the subdomain Schur complement preconditioner and the fractional Sobolev norm based preconditioner Sˆ will be spectrally equivalent to S as h → 0.26) and noting .  2(k + 1)  ⎪ ⎪ 1 3 − 2 sin2 ( jπ ) for 1 ≤ j ≤ k. 3. by analogy with the two dimensional case. This can be verified heuristically by computing its entries and plotting its magnitude. provided that the grid on the interface B can be mapped onto a uniform rectangular grid [CH2. may depend on the aspect ratios of the sub- domains. CH13]. for 1 ≤ i. 3.36. The convergence rate. with a(1) = a(2) = 1. Remark 3. or by using expression (3. The following result is proved in Chap. The eigenvalues of matrices Ah and Gh corresponding to eigenvector qj is: ⎧ ⎪ ⎪ 2 jπ for 1 ≤ j ≤ k ⎨ λj (Ah ) = 4(k + 1) sin ( ). For any 2nd order. In this case. It may be implemented as in Alg. and also on the coefficients.9. This choice of D yields the precondi- tioner M = QT DQ of [BR11.4. Lemma 3.35. Qij = 2/(k + 1) sin (i j π/(k + 1)) .4.

CH9]. Below. with increasing distance between the nodes xr and xs . Furthermore.)T ⎪ ⎩ p3 = (0. . we illustrate a specific choice of probe vectors to construct a tridi- agonal approximation M of S.)T . . with increasing distance between the nodes xi and xj . to approximate S. in this subsection we shall describe two alternative algebraic preconditioners for the Schur complement matrix. BE2. AX] will offer no advantages over direct solution. CH9] does not require the explicit assembly of matrix S. 0. . one based on sparse approximation of the Schur complement using the probing technique. and requiring that the matrix vector products of S with each probe vector pl matches the matrix vector product of M with the same probe vector: M pl = Spl . 1. Choose: ⎧ ⎪ ⎨ p1 = (1. SA2. we shall describe the probing technique for determining a sparse approximation M of S. for 1 ≤ l ≤ (2d + 1). A careful choice of the probe vectors based on the decay in the entries of S can increase the accuracy of the probe approximation M . 1. 1. . Both preconditioners may be applied without assembly of S. but does require the computation of the matrix-vector products of S with the chosen probe vectors. 0. 0. The first algebraic preconditioner we shall consider is based on the con- struction of a sparse matrix approximation of S using a probing technique [CH13. and the other based on incomplete factorization of the subdomain (i) matrices AII . This motivates choosing a band matrix M . For Ω ⊂ IR2 . If matrix S is of size k. say of band width d. 1. 0. and three probe vectors p1 . as is the case in itera- tive substructuring methods. Nonzero entries of the band matrix M can be determined by choosing probe vectors pl . if the nodes xi on B are ordered consecutively along B. 0. provided the nonzero entries of M approximate the dominant entries of S. traditional algebraic preconditioners based on ILU factorization [GO4. then the entries of the Schur complement matrix S typically decay along diagonal bands. such factorizations cannot be employed when matrix S is not assembled. . KE7. 0. and arises (l)−1 from the decay in the entries (AII )rs of the discrete Green’s function asso- ciated with the elliptic equation on the subdomains. these requirements yield k (2d + 1) equations for the unknown entries of M .152 3 Schur Complement and Iterative Substructuring Algorithms that the discrete Green’s function A−1 II is a dense matrix as h → 0 (due to + the global domain of dependence of elliptic equations). . . The resulting probing technique [CH13. KE7. In this case 2d + 1 = 3. . . This suggests that a sparse approximation M of S may be effective as a preconditioner. 0. 0. 0. A sparse approximation of S can be heuristically moti- vated by a decay property in the entries Sij of a two subdomain Schur com- plement matrix. . This decay property can be observed when S is assembled explicitly. As a result. say for 1 ≤ l ≤ (2d + 1).) T p2 = (0. 1. and also simplify the linear system for the nonzero entries of M . In the following. p2 and p3 will be sufficient. 0. Instead.

These products can be computed without the (i) (i)T (i)−1 (i) assembly of S. . All the nonzero entries of the tridiagonal matrix M can be computed explicitly using the above equations. ⎥ ⎢ .. An algorithm for constructing a tridiagonal matrix M of size k is summarized below. .5 (Probe Tridiagonal Approximation of S) Input: Sp1 .. due to the nonzero entries of S outside the tridiagonal band.. .. Let j = mod(i.4.... Sp3 1. . . . ⎥ ⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦ . .20) with S (i) = ABB − AIB AII AIB .. 3). As input. if the entries of S decay rapidly outside the tridi- agonal band. Sp2 . 3 yields: ⎡ ⎤⎡ ⎤ ⎡ ⎤ m11 m12 100 m11 m12 0 ⎡ ⎤ Sp1 Sp2 Sp3 ⎢m m m ⎥⎢0 1 0⎥ ⎢m m m ⎥ ⎢ 21 22 23 ⎥⎢ ⎥ ⎢ 21 22 23 ⎥ ⎢ . . ⎥ ⎢ . ⎥ ⎢ . If the Schur complement matrix S is tridiagonal. ... . . 3) ≡ 2 if i = 3k + 2. . or based on the identity S = ABB −ATIB A−1 II AIB . . The computational cost of con- structing a tridiagonal approximation M of S will essentially be proportional to the cost of computing three matrix-vector products with S. ⎥ ⎢ 32 33 34 ⎥⎢ ⎥=⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ . . . . 3) denotes the remainder in the division of i by 3. Thus.. then this approximation may be reasonably accurate.. ⎢ ... we employ the notation: ⎧ ⎪ ⎨ 1 if i = 3k + 1. 3. . . mod(i.4 Two Subdomain Preconditioners 153 Equating M pk = Spk for k = 1. . ⎥ ⎢ . . Remark 3. .⎥⎢ .. ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ . . 2. 3. . . . . this algorithm requires three matrix vector products of the form Spj for j = 1. 2. . However. it is easily ver- ified that the entries Mij of the reconstructed matrix will match Sij .. the reconstructed entries Mij will only be approximations of the corresponding entries Sij .i ≡ (Spj )i+1 Endif 5. Remark 3. . Endfor Output: Tridiagonal matrix M . . . however. . More generally. . k do: 2.. .37. . for some integer k ⎪ ⎩ 3 if i = 3k. ⎥ . .i+1 ≡ (Spj+1 )i mi+1. . 3. . . . . If i < k define:  mi. . for some integer k. for some integer k mod(i. . For an integer i.38. ⎥ ⎢ .. ⎥ ⎢ . . . For i = 1. . ⎥ ⎢ . mii = (Spj )i 4. . · · · . ⎥ ⎢ m m m ⎥ ⎢ 0 0 1 ⎥ ⎢ m34 m32 m33 ⎥ ⎢ .. Algorithm 3. using the identity (3.

We conclude our discussion on algebraic approximation of S. The following result concerns a tridiagonal probe approximation based on three probe vectors. Proof. with another (i) approximation based on an incomplete factorization of the matrices AII . however.40. it may be necessary to further approximate M by its ILU factorization. Thus. then the tridiagonal probe approxima- tion M of S will also be an M -matrix. if i = j. into d colors.. i. . A reconstruction algorithm may be derived for M using the symmetry of M . In the two subdomain case. a different probing technique [KE7] involving only two probe vectors can be employed to construct a symmetric approximation M of S. given a nonsymmetric tridiagonal matrix M . requires some care. define d probe vectors p1 . See [CH9].e. to enable efficient solvability of the preconditioner. . then nodes j and k cannot be of the same color. . If S is an M -matrix. Mji }. pd so that pj is one at all indices corresponding to the j’th color and zero on all other nodes. To construct an approximation M of S with the same sparsity pattern as G. To generalize to other sparsity patterns. if i = j M˜ ii = Mii . Such approximations will be of interest primarily for multisubdomain decomposi- tions [CA33].154 3 Schur Complement and Iterative Substructuring Algorithms Remark 3. The reconstructed matrix M in the above algorithm may not be symmetric. Given such a coloring of the nodes.67) ⎪ ≈ ABB − AIB L LII AIB − AIB L LII AIB ⎪ ⎪ II II ⎩ ≡ M.   The tridiagonal probing procedure described above can be easily gener- alized to to band matrices with larger bandwidths. the method employs an incomplete T (i) ˜ (i) L ˜ (i) for i = 1. if node i is adjacent to nodes j and k in G. 2 factorization of the subdomain stiffness matrices AII ≈ L I I to compute a low cost dense approximation M of S: ⎧ ⎪ (1)T (1)−1 (1) (2)T (2)−1 (2) ⎪ ⎪ S = ABB − AIB AII AIB − AIB AII AIB ⎨ (1)T ˜ (1)−T ˜ (1)−1 (1) (2)T ˜ (2)−T ˜ (2)−1 (2) (3. Furthermore. However. its symmetrization M ˜ will also be an M -matrix. the first step would be to determine a coloring or partitioning of the nodes so that nodes of the same color are not adjacent in G.39. For a model Laplacian on a rectangular grid with periodic boundary conditions on two boundary edges. a symmetric tridiagonal approximation M ˜ may be obtained as: M˜ ij = max{Mij . Once a sparse approximation M of S has been constructed. Mij = Mji . Alternatively. . . the condition number of the tridiagonal probe approximation will satisfy cond(M. Suppose G denotes the adja- cency matrix representing the sparsity pattern desired for M . S) ≤ C h−1/2 in comparison to cond(S) ≤ C h−1 . We omit further details. Lemma 3.

referred to as globs.7 on Neumann-Neumann preconditioners. However. followed by threshold truncation M˜ of M will yield a sparse approximation ˜ M of S. which can then be used as a preconditioner [CA33] for S. if |Mij | ≤ η (|Mii | + |Mjj |) Mij ≡ (3. but to implement such preconditioners.5 Preconditioners in Two Dimensions 155 If the matrix S is of size k. will be defined based on a partition of the interface B into subregions Gi ⊂ B. then the cost of constructing M will typically be proportional to O(k 2 ). Further- more. 3. The use of incomplete factorization to construct a dense approximation M of S. However. then the approximation M in (3.1) is typically more difficult to precondition for multisubdomain decompositions and in higher dimensions. The approximation M will typically be dense. see [BR24]. With the exception of the coarse space V0 ⊂ V . As the size h0 of each subdomain decreases. the entries in the nonzero blocks will decay in magnitude with increasing distance between the nodes.   3.68) is ˜ will applied to the resulting dense matrix M .5) on a domain Ω ⊂ IR2 . if threshold truncation as in (3. Importantly. In this section. These globs may be extended to define overlapping or non-overlapping segments on B. the condition number of the multisubdomain Schur complement matrix increases from O(h−1 ) to O(h−1 h−1 0 ). Lemma 3. In the multisubdomain case. in particular).68) Mij . . See [CA33]. otherwise. endowed with the inner product generated by S.5 Preconditioners in Two Dimensions The Schur complement matrix S associated with the discretization of elliptic equation (3. the other subspaces Vi ⊂ V required to define the Schwarz subspace algorithm for S. for the space V = IRnB of nodal vectors on B.3. the truncated approximation M also be an M -matrix. the submatrices of S must be approximated without assembling S. 3.67) will also be an M -matrix.41. Schwarz subspace precon- ditioners employing suitable overlap between blocks of S can be effective (see Chap. Each of the preconditioners we describe will have the structure of an additive Schwarz subspace preconditioner from Chap. and to the properties of the Steklov-Poincar´e map on B. This difficulty can be attributed to the increasingly complex geometry of the interface B for a multisubdomain decomposition. 2. Proof. Furthermore. BPS and vertex space preconditioners for a multisubdomain Schur complement matrix S associated with (3. If A is an M -matrix. we shall describe the block Jacobi. sufficiently small entries of M may be truncated to zero using a threshold parameter η > 0: ˜ 0. the Schur complement matrix will have zero block entries corresponding to nodes on disjoint subdomains ∂Ωi ∩∂Ωj = ∅.

Eq so that each El corresponds uniquely to a nonempty connected segment Bij .. 3.69) ⎣ SE1 Eq · · · SEq Eq SEq V ⎦ T SE 1V · · · SE T qV SVV . 3. ⎪ for 1 ≤ i ≤ p B ≡ ∪pi=1 B (i) ⎪ ⎩ Bij ≡ int B (i) ∩ B (j) . . with some chosen ordering within each edge El and cross-point set V. . Consider a finite element discretization of elliptic equation (3. . then the Schur complement matrix can be block partitioned as: ⎡ ⎤ SE1 E1 · · · SE1 Eq SE1 V ⎢ .156 3 Schur Complement and Iterative Substructuring Algorithms Ω1 Ω2 cross Gm point ⊗ vertex vl region Ω7 Ω8 an edge El Fig. . for 1 ≤ i. Eq . ⎥. Here int(B (i) ∩ B (j) ) refers to the interior of B (i) ∩ B (j) . V. · · · . The distinct edges will be enumerated as E1 . if a coarse space V0 is included. with subdomains of diameter h0 . Each connected and nonempty boundary segment Bij will be referred to as an edge.4. ⎥ ⎢ . We shall employ the notation: ⎧ (i) ⎨ B ≡ ∂Ωi \BD . 3. see [MA14]. . The interface B arising in the decomposition of a two dimensional domain can be partitioned based on edges and cross-points as follows: B = E1 ∪ · · · ∪ Eq ∪ V. Endpoints in B of open segments Bij will be referred to as vertices or cross-points. . . j ≤ p.. edges and cross-points will be globs. see Fig. ⎥ (3. . For Ω ⊂ IR2 . Ωp form a nonoverlapping box type decomposition of Ω ⊂ IR2 as in Fig. .4. If the indices of nodes on B are grouped and ordered based on the globs E1 . global transfer of information will be facilitated between the subdomains. and the collection of all vertices will be denoted V: V = B\ (E1 ∪ · · · ∪ Eq ). and this will reduce the dependence of the condition number on h0 . The term glob will refer to subregions of the interface which partition B. . .1) with Dirichlet boundary BD = ∂Ω. Let Ω1 .. ⎥ S=⎢ ⎢ T .4. A partition of Ω into 8 subdomains as h0 → 0+ .

Otherwise.1 Block Jacobi Preconditioner In two dimensions. . This is because in this case submatrix SEl El will correspond to a two subdomain Schur complement matrix. The block submatrices SEl V and SVV will typically have nonzero entries since there will be nodes in El adjacent to nodes in V. However. Since the Schur complement matrix S is not typically assembled in iterative substructuring methodology. Ωj and interface El .5. the different submatrices of S in (3.69): ⎡ ⎤ SE1 E1 0 ⎢ . . by applying property (3. (3. ⎥ M =⎢ ⎢ ⎥. 3.70) for submatrix SEl El . This may be applied to yield: ⎡ ⎤T ⎡ (i) (i) ⎤−1 ⎡ ⎤ 0 AII 0 AIEl 0 ⎢ (j) ⎥ −1 = ⎣ 0 ⎦ ⎣ 0 AII AIEl ⎦ ⎣ 0 ⎦ . the submatrices SEl Ek will be zero.71) T T I (i) (j) AIEl AIEl AEl El rEl −1 so that the action of SE l El on a subvector can be computed at the cost of solv- ing the preceding linear system. arising from the decomposition of the subregion Ωi ∪ Ωj ∪ El into the subdomains Ωi . the action of submatrix SEl El on a subvector can be computed explicitly without assembly of SEl El when edge El = B (i) ∩ B (j) .72) i=1 using the interface restriction and extension matrices RG and RTG defined in (3. In matrix form. .26) of Schur complement matrices. More generally. We now describe preconditioners. (3.. the block submatrices SEl Ek will be nonzero and dense. it can be noted that when edges El and Ek belong to a common subdomain boundary B (i) . such a preconditioner will correspond to the block diagonal of matrix (3. Since .5 Preconditioners in Two Dimensions 157 Here SEl Er . (j) SE l El rEl (3.69) will also not be assembled explicitly. This observation yields the formal expression: (i)T (i)−1 (i) (j)T (j)−1 (j) SEl El = AEl El − AIEl AII AIEl − AIEl AII AIEl . ⎥ ⎢ .19) between nodes on B and nodes on G = El or G = V. ⎥ ⎣ SEq Eq ⎦ 0 SVV The action of the inverse of the block Jacobi preconditioner satisfies:  p −1 −1 M −1 ≡ RTEi SE i Ei REi + RTV SVV RV . 3. SEl V and SVV denote submatrices of S corresponding to indices in the respective globs. Eq and V. . a block Jacobi preconditioner for S can be defined based on the partition of B into the globs E1 .

This observation. Approximation of SEl El . heuristically suggests replacing SVV by MVV = AVV . the diagonal blocks SEl El = REl SRTEl and SVV = RV SRTV of S must typically be approxi- −1 mated.5. Approximation of SVV .158 3 Schur Complement and Iterative Substructuring Algorithms the Schur complement matrix S will not be assembled.72) by a global coarse space residual correction term of the form RT0 S0−1 R0 . Due to its block diagonal structure.   3. DR14. and a five point stencil is employed. it can easily be verified that matrix SVV will be identical to the submatrix AVV of stiffness matrix A. and this results in a non-optimal convergence rate as h0 → 0. If M is the block Jacobi preconditioner and the subdomains are of diameter h0 . Such preconditioners must be scaled based on the coefficient a(x) within the subdomains Ωi and Ωj . DR10]. Theorem 3. Neumann-Neumann. If there are nEl nodes on El then SEl El will be of size nEl . Proof. the interior solution in a rectangular subdomain will not depend on the nodal value on corner vertices. See [BR12. This does not require as- sembly of SEl El . and the subdomains are rectangular boxes. then there exists C > 0 independent of h0 and h: cond(M. this preconditioner can be obtained by replacing the local residual correction −1 term RTV SVV RV on the vertices V in the block Jacobi preconditioner (3. the action of SE l El on a vector can be computed exactly using (3. Alternate approximations MEl El of SEl El can be obtained by employing any two subdomain preconditioner for SEl El .2 BPS Preconditioner As with the block Jacobi preconditioner. the block Jacobi preconditioner does not globally exchange in- formation between the different subdomains. We assume that the grid in quasi-uniform. FFT based or algebraic approximation based preconditioners. Formally.42. or alternatively. This is a consequence of the property that for five point stencils. fractional Sobolev norm. As a result. Choices of such preconditioners include Dirichlet-Neumann. the BPS preconditioner [BR12] also has the structure of a matrix additive Schwarz preconditioner for S. S) ≤ Ch−2 0 1 + log2 (h0 /h) . Since block submatrix SEl El corresponds to a two subdomain Schur −1 complement matrix by (3. If there are nV vertices in V then SVV will be of size nV .70) when edge El = B (i) ∩ B (j) . the block Jacobi preconditioner M ig- nores coupling between distinct edges and between the edges and the vertex set V. We shall . When Ω is a rectangular domain.71). The latter is easily seen to be diagonal. the action of the inverses of the submatrices SE l El −1 and SVV must be approximated. We outline how such approximations can be obtained. The block submatrix SVV can typically be approximated by a diagonal matrix based on the following heuristics.

will not be a matrix with zero-one entries. Ωp form a coarse triangulation Th0 (Ω) of Ω with elements of size h0 and nodes corresponding to the vertices in V. Consider a Poisson problem on a rectangular domain Ω partitioned . ⎥. we heuristically indicate why such an approximation can be em- ployed.74) φhn00 (x1 ) · · · φhn00 (xnB ). . The matrix S0 = R0 SRT0 associated with the coarse space is typically approximated by a coarse grid stiffness matrix A0 obtained by discretizing the underlying elliptic equation (3. φhn00 (x) the coarse grid nodal basis functions associated with these vertices. . and Dl was a suitably chosen diagonal matrix from (3. and the action of the inverse of the BPS preconditioner will have the form:  q −1 M −1 ≡ RTEl SE l El REl + RT0 S0−1 R0 . . . given R0 . which have zero-one entries. the coarse space restriction matrix R0 is usually defined when the subdomains Ω1 . Approximation of S E l E l .73) i=1 Unlike the restriction matrices REl and RV onto El and V. . . since the Schur complement matrix S is not assembled. and denote by φh1 0 (x). the matrix R0 whose row space defines the coarse space. Its transpose RT0 of size nB × n0 is an interpolation onto nodal values on B. Below. As with the block Jacobi preconditioner. where nB denotes the number of nodes on B.72). the coarse space restriction matrix R0 is defined as the following n0 × nB matrix: ⎡ h0 ⎤ φ1 (x1 ) · · · φh1 0 (xnB ) ⎢ ⎥ R0 ≡ ⎢⎣ . xnB .64). In this case. Approximation of S 0 .73) can be replaced by any suitable two subdomain Schur complement preconditioner MEl El just as for the block Jacobi preconditioner (3. Suppose the nodes on B are enumerated as x1 . suitable approximations of the ma- trices SEl El = REl SRTEl and S0 = R0 SRT0 must be employed in the BPS preconditioner (3. 3. Below. we indicate various such approximations [BR12]. however. vn0 .1) on the coarse grid. In the original BPS algorithm. . the matrix S0 = R0 SRT0 will not be a submatrix of S. In applications. . Then.74). . When the region of support of Range(RT0 ) covers Ω. respectively. we enumerate the vertices in V as v1 .. . ⎦ (3. with a(k) corresponding to an evaluation of coefficient a(x) at some point in Ωk .. . this can help reduce the dependence of the condition number of the preconditioned Schur complement matrix on h0 . · · · .5 Preconditioners in Two Dimensions 159 define R0 in (3. . The submatrix SEl El in (3. Heuristically. As a result. the residual correction term RT0 S0−1 R0 in the BPS preconditioner will transfer information globally between the different subdo- mains. . SEl El was approximated by a preconditioner of the form (a(i) + a(j) )Ql Dl QTl where Ql was a discrete sine transform of size nEl . matrix S0 ≡ R0 SRT0 . . (3. .73).

Let φhl 0 (x) denote the coarse grid finite element nodal basis centered at vertex vl .160 3 Schur Complement and Iterative Substructuring Algorithms into subdomains Ωi which form a coarse triangulation Th0 (Ω) of Ω of size h0 . Then. the entries (A0 )ij of the coarse space stiffness matrix A0 will satisfy: .

  3. Ωp form a coarse triangulation of Ω of size h0 . and thus also harmonic (and discrete harmonic) within each subdomain. DR10]. Let uB and wB denote the nodal vectors representing the coarse space nodal basis functions φhi 0 (x) and φhj 0 (x) on B. . For each vertex vl ∈ V.43. -T . Then. the logarithmic growth factor 1 + log2 (h0 /h) in the condition number of the BPS preconditioner arises because the BPS preconditioner does not approximate the coupling in S between different edges El of B. each local residual correction term approximates coupling in S between edges adjacent to that vertex. wTB where E ≡ −A−1 II AIB denotes the discrete harmonic extension map from B into the interior ∪pi=1 Ωi of the subdomains. uTB and (EwB )T . DR14. Then. the vector representation of T T φhi 0 (x) and φhj 0 (x) on Ω will be given by (EuB )T . then C will also be independent of a(·). Let Th (Ω) denote a quasiuniform triangulation of Ω and let the subdomains Ω1 . by (3. (A0 )ij = A φhi 0 . φhj 0 . . referred to as vertex regions. φhj 0 ) = (A0 )ij .3 Vertex Space Preconditioner for S From a theoretical viewpoint. The vertex space preconditioner extends the BPS preconditioner by including local residual correction terms based on overlapping segments −1 of B. see [SM3]. wB uB This yields that A0 = S0 for this geometry and choice of coefficients a(x). More generally. Consequently. - EwB EuB T (S0 )ij = wB SuB = A = A(φhi 0 . This holds because each coarse grid function φhl 0 (x) is linear within each subdomain. a vertex region Gl is a star shaped connected subset of B that contains segments of length O(h0 ) of all edges Er emanating from vertex vl . Proof.5. there exists C > 0 independent of h0 and h such that: cond(M.27). S) ≤ C 1 + log2 (h0 /h) . matrix A0 may be employed as an approximation of S0 . . See [BR12. It includes a local correction term of the form RTGl SG l Gl RGl involving nodal unknowns on regions Gl ⊂ B. . . The following result concerns the condition number of the BPS preconditioner. If a(·) is constant within each Ωi . it will hold that: . Theorem 3. By construction.

Matrix RGl will be of size nGl × nB when there are nGl nodes on vertex region Gl . one may approximate the action of −1 SG i Gi on a vector rGi as follows:  T  −1   −1 0 ADi Di ADi Gi 0 SG r i Gi G i ≈ . sparse approximations of SGi Gi can be computed efficiently using the probing technique.73). .5 Preconditioners in Two Dimensions 161 Formally. ∪ Gn0 ). we focus on the action of SG i Gi . . . Partition the nodes of Th (Ω) in Ωvi into those in Di ≡ Ωvi \Gi and those in Gi . MA37. In practice. (3.4. with an additional coarse space correction term. as defined by (3. This will induce a block partitioning of the submatrix A(Ωvi ) of stiffness matrix A corresponding to all nodes in Ωvi :   (Ωvi ) ADi Di ADi Gi A = . SM3].75) l=1 i=1 The resulting preconditioner has the structure of a matrix additive Schwarz preconditioner for S based on the overlapping decomposition: B = (E1 ∪ . Since matrix S is generally not assembled. . the vertex space preconditioner is obtained by adding the terms −1 RTGl SG l Gl RGl to the BPS preconditioner (3. . each matrix SGl Gl = RGl SRTGl will be a submatrix of S of size nGl corresponding to indices of nodes in Gl . 3. By construction. The vertex space preconditioner can be implemented like the BPS preconditioner. Consequently. yielding:  q  n0 −1 M −1 = RTEl SE l El REl + RT0 S0−1 R0 + −1 RTGi SG i Gi R Gi . Below. see Fig. it will be convenient to construct each vertex region Gl as the intersection of interface B with a subdomain Ωvl ⊃ vl of diameter O(h0 ) centered at vl . 3. Approximation of S Gi Gi . The following convergence bound will hold for the vertex space preconditioner. and have entries which are zero or one. by weighted sums of FFT based matrices.73). SGi Gi and S0 must be appropriately approximated to imple- ment the preconditioner. see [NE3.19). ∪ Eq ) ∪ V ∪ (G1 ∪ . of interface B. the matrices SEl El . and restriction matrix RGl will map a nodal vector on B to its subvector corresponding to nodes in Gl . ATDi Gi AGi Gi Using the above block partitioned matrix. or by use of inexact factorizations see [CH12. I ATDi Gi AGi Gi rGi Alternatively. Let Ωvi ⊂ Ω denote a subregion used to define the vertex region Gi ≡ B ∩ Ωvi . CA33]. each Gl will be a cross shaped or star shaped subregion of B. since these terms will be identical to −1 those in (3. The matrices SEl El and S0 can be approximated as described for the BPS preconditioner.

1) with BD = ∂Ω. effective preconditioners can be constructed (see in particular. but may depend on the variation of a(·).7 on Neumann-Neumann preconditioners) by employing Schwarz subspace methods with more overlap between blocks of S. There also exists a constant C1 independent of h0 . Proof. 3. and the jumps in a(·) (provided a(x) is constant on each subdomain Ωi ). 3. Chap. each face will be two dimensional. However. By definition.6 Preconditioners in Three Dimensions The Schur complement matrix S for a three dimensional multi-subdomain decomposition is more difficult to precondition than in two dimensions. cond(M. a vertex space preconditioner. DR10]. where C0 > 0 is independent of h0 . We consider a decomposition of Ω ⊂ IR3 into p non-overlapping box type or tetrahedral subdomains Ω1 . If the diameter of vertex subregions is β h0 . For the Dirichlet boundary value problem (3. which is the same growth as for the BPS preconditioner. the bounds for the 2condition number of the vertex space algorithm can deteriorate to 1 + log (h0 /h) . Typically.162 3 Schur Complement and Iterative Substructuring Algorithms Theorem 3. then the condi- tion number of the vertex space preconditioned system will satisfy: cond(M. They will typically be well defined for tetrahedral or box type subdomains: ⎧ (i) ⎨ Fij ≡ int B ∩ B ⎪ (j) W (i) ≡ B (i) ∩ (∪j=i ∂Fij ) ⎪ ⎩ W ≡ W (1) ∪ · · · ∪ W (p) .   Thus. in the presence of large jumps in the coefficient a(·). This difficulty arises due to the more complex geometry of the interface B in three dimensions. h. and define the interface as B = B (1) ∪ · · · ∪ B (p) . See [SM. these subdomains will be assumed to form a quasi- uniform coarse triangulation Th0 (Ω) of Ω. and is referred to as a face of Ωi when it is nonempty. and a par- allel wirebasket preconditioner. Here Fij = int B (i) ∩ B (j) denotes the interior of region B (i) ∩ B (j) . The subregion W (i) of B (i) is referred to as a local . we let B (i) ≡ ∂Ωi \BD denote the non-Dirichlet segment of ∂Ωi . Our discussion of three dimensional preconditioners will focus on several block Jacobi preconditioners for S.44. The different additive Schwarz matrix preconditioners we shall consider for S will be based on a decomposition of the interface B into the following subregions of B referred to as globs [MA14]. Ωp . having diameter h0 . · · · . h and β. S) ≤ C1 1 + log2 (h0 /h) . S) ≤ C0 (1 + β −1 ).

Edges and vertices can be expressed formally as: ⎧ ⎨ Eijk ≡ int F ij ∩ F ik ⎪ V ≡ W \ (∪i. Fq where q denotes the total number of faces. as in two dimensions. Fl will correspond each face uniquely to some nonempty intersection int B (i) ∩ B (j) . as defined in (3. In practice. . Given a glob G ⊂ B containing nG nodes of Ωh .k Eijk ) ⎪ ⎩ = {vl : vl ∈ V}.j. . The union of all local wirebaskets is referred to as the global wirebasket.45. Er where r denotes the total number of such edges. .6 Preconditioners in Three Dimensions 163        Vertex     Gk   vk        ⊗     ⊗   @ @  @ @ Edge @ @ El @ @ Face  @ @   @      @@  Fij . The collection of all vertices will be denoted V. Definition 3. it will be convenient to decompose the wirebaskets into smaller globs. We define vertices as endpoints of edges.19). @       @ @       @ @   Ωi Ωi El Ωi Fig. 3. Boundary segments and vertex regions for three dimensional subdomains wirebasket of Ωi and is the union of the boundaries ∂Fij of all faces of Ωi .5 for an individual subdomain. each wirebasket will be connected and the union of several one dimensional segments. The entries of these glob based restriction and extension matrices will be zeros or ones. we may partition the interface B into the following globs: B = F1 ∪ · · · ∪ Fq ∪ W. . . . . The above mentioned subregions are indicated in Fig. By definition. we shall enumerate all the faces in B as F1 . We define an edge as a maximal line segment of a local wirebasket. and we enumerate all the nonempty edges as E1 . 3. we let RG ≡ RG RB T denote a restriction matrix of size nB × nG which restricts a nodal vector on B to a subvector corresponding to nodes on G. and by construction. . Typically. . By definition each edge will be open. 3. vn0 . . Similarly. Its transpose RTG ≡ RB RG T will extend a vector of nodal values on G to a vector of nodal values on B (extension by zero).5. where n0 will denote the total number of vertices in W . In applications. we enumerate all the vertices as v1 . . . . homeomorphic to an open interval.

⎥.. ⎥ ⎣ SFq Fq ⎦ 0 SW W In terms of restriction and extension matrices. which we shall denote as RW (i) W . If nGl denotes the number of nodes on glob Gl . . ..46. (3. Its transpose RTW (i) W ≡ RW RW T (i) will (i) extend a vector of nodal values on W to a vector of nodal values on W (extension by zero). 3.1 Block Jacobi Preconditioner for S We first describe a block Jacobi preconditioner based on the decomposition of B into the faces F1 . Fq and the wirebasket W : B = F1 ∪ · · · ∪ Fq ∪ W. . . Definition 3. . . ⎥ ⎢ .6. .76) l=1 where SFl Fl = RFl SRTFl and SW W = RW SRTW are submatrices of S corre- sponding to indices in Fl and W . This nonoverlapping decomposition of B induces a block partition of S as: ⎡ ⎤ SF1 F1 · · · SF1 Fq SF1 W ⎢ . we will on occasion employ an additional restriction map. . The block Jacobi preconditioner will be the block diagonal part of S: ⎡ ⎤ SF1 F1 0 ⎢ .164 3 Schur Complement and Iterative Substructuring Algorithms In the three dimensional case. ⎥ M =⎢ ⎢ ⎥. As with the other Schwarz preconditioners for S. ⎥ ⎣ SF1 Fq · · · SFq Fq SFq W ⎦ SFT1 W · · · SFTq W SW W corresponding to indices of nodes within the chosen subregions of B. then SGi Gj will denote a submatrix of S of size nGi × nGj corresponding to the nodes on glob Gi and Gj . in practice the submatrices SFl Fl and SW W of S must be replaced by suitable approximations since S will typically not be assembled. Let RW (i) W ≡ RW (i) RW T denote the matrix which restricts a vector of nodal values on the global wirebasket W into a subvector of nodal values on the local wirebasket W (i) . ⎥ ⎢ . Various alternative approximations may be chosen for such approximations... . . ⎥ S=⎢ ⎢ T . the action of the inverse M −1 the block Jacobi preconditioner will be:  q M −1 = RTFl SF−1 l Fl −1 RFl + RTW SW W RW ..

for the seven point stencil. and a seven point stencil is used for the finite element discretization of (3. -T  (i) (i) −1 . 3. Replacing the submatrices SFl Fl and SW W by the preceding approximations will yield an approximate block Jacobi preconditioner for S. at most seven entries of the form (AW W )ij will be nonzero when xi ∈ V. Alternative preconditioners for SFl Fl may be obtained using algebraic approximation of SFl Fl based on generalization of the tridiagonal probing procedure [KE7. An approximation of SW W = RW SRTW can be based on the following heuristic observation. and the subdomains are boxes.47.6 Preconditioners in Three Dimensions 165 Approximation of S F l F l . CH9] or ILU [CA33]. If the triangulation of Fl induced by Ωh can be mapped bijectively into a rectangular grid. Indeed. then we may employ an FFT based preconditioner of the form SFl Fl ≈ (a(i) + a(j) )QDQT where Q is a two dimensional fast sine trans- form and D is a diagonal matrix approximating the eigenvalues of a reference Schur complement matrix SˆFl Fl associated with a three dimensional cubical domain partitioned into two strips [RE].1). I AIFl AFl Fl rFl A Neumann-Neumann preconditioner will also approximate SFl Fl . (j) l Fl rFl (3. AW W may still be used as an approximation of SW W . Alternatively. the piecewise discrete harmonic extension of nonzero nodal values on W and zero nodal values on B\W will be zero in the interior of the subdomains. Ωj and Fl . while at most three entries of the form (AW W )ij will . Here a(l) denotes the coefficient a(x) evaluated at a sample point of Ωl . then SFl Fl will corre- spond to a two subdomain Schur complement associated with the partition of Ωi ∪ Fl ∪ Ωj into Ωi and Ωj .26). This can be verified by using the property that for seven point stencils the nodal values on the wire- basket will not influence the interior Dirichlet solution in a box subdomain. When the geometry of the subdomains is more general. As a consequence. If Fl = int B (i) ∩ B (j) . The desired property that SW W = AW W will now follow from (3. Efficient sparse solvers may be employed to solve systems of the form AW W uW = rW . Remark 3. since AW W will typically be sparse. a Dirichlet-Neumann preconditioner may be employed. then matrix SW W = AW W . Approximation of S W W .77) T T I (i) (j) AIFl AIFl AFl Fl rFl where the above blocks are submatrices of A corresponding to indices in Ωi . Consequently. - (i)−1 0 AII AIFl 0 SFl Fl rFl ≈ (i)T (i) . for instance based on subdomain Ωi : . When Ω is rectangular. the action SF−1 l Fl rFl may be computed exactly as follows: ⎡ ⎤T ⎡ (i) (i) ⎤−1 ⎡ ⎤ 0 AII 0 AIFl 0 ⎢ (j) ⎥ SF−1 = ⎣0⎦ ⎣0 AII AIFl ⎦ ⎣ 0 ⎦.

76). Lemma 3. for some C > 0 independent of h and h0 . the coarse space restriction matrix R0 is defined analogous to (3.   As the preceding theorem indicates. . can be improved by including some global transfer of information. l=1 into (3. and the action of −1 RTW SW W RW can be approximated by the following matrix additive Schwarz preconditioner:  r −1 RTW SW W RW ≈ RTEl A−1 T −1 El El REl + RV AVV RV .166 3 Schur Complement and Iterative Substructuring Algorithms be nonzero when xi ∈ El .76) for S deteriorates as the subdomain sizes h0 becomes small. Proof. . l=1 l=1 (3. (3.76) and replace the local correction term R−1 −1 V SVV RV on the vertices −1 −1 V by a coarse space correction term R0 S0 R0 to obtain the preconditioner:  q  r M −1 = RTFl SF−1 l Fl RFl + −1 RTEl SE l El REl + RT0 S0−1 R0 . the wirebasket W can be further decomposed into the edges E1 . The following result concerns the convergence rate associated with (3. . . To obtain the first variant. the band width of AW W will depend on the ordering of nodes within W . However. The condition number of the Schur complement matrix pre- conditioned by the block Jacobi preconditioner (3. Er and the vertex set V. A second variant is obtained by adding the local correction term R−1 −1 V SVV RV yielding:  q  r M −1 = RTFl SF−1 l Fl RFl + −1 RTEl SE l El REl + R−1 −1 T −1 V SVV RV + R0 S0 R0 .79) The resulting preconditioners satisfy the following bounds.74). We next describe two variants of the block Jacobi preconditioner (3. DR10]. the convergence rate of block Jacobi preconditioner (3. however. In practice. This deterioration arises primarily because this block Jacobi precon- ditioner exchanges information only locally for the chosen diagonal blocks in the block partition of S. we substitute the approximation:  r −1 −1 −1 RTW SW W RW ≈ RTEl SE l El REl + RTV SVV RV . .76) satisfies: cond (M.48. S) ≤ Ch−2 2 0 (1 + log(h0 /h)) . with coarse grid nodal basis functions φhi 0 (x) corresponding to each vertex vi ∈ V. l=1 A variant of the block Jacobi preconditioner employs such an approximation. See [BR15. This convergence rate.78) l=1 l=1 Here. and S0 = R0 SRT0 .76) incorporating coarse space correction [DR10].

Approximation of S E l E l . As we have already described approximations of SFi Fi . Proof. the scaling factor for SEl El must be proportional to h−2 instead of h. The vertex space preconditioner [SM3] for S. For smooth coefficients. but may depend on the coefficient a(x). and may be approximated as follows: (SVV )ii ≈ h σi .79) satisfies: cond(M. S) ≤ C2 (1 + log(h0 /h))2 . for finite element discretizations.79) will yield better convergence than preconditioner (3. For finite difference discretizations. The preconditioner M in (3.78) satisfies the bound: h0 cond(M.78). h and jumps in the coefficient a(x). See [DR10]. SEl El . The coarse space matrix S0 = R0 ART0 can be ap- proximated by A0 as in two dimensions. we shall only focus on the other terms. To obtain a heuristic approximation of SEl El . Approximation of S VV . h while the bound for the preconditioner M in (3.50. where C1 is independent of h0 . where σi denotes a suitably weighted average of the coefficients a(·) in subdomains adjacent to vertex vi .6 Preconditioners in Three Dimensions 167 Theorem 3. due to elimination of (h0 /h).6. To obtain an approximation of SVV . 3. For finite difference schemes.49. SVV and S0 must be replaced by suitable approximations since S is not assembled in practice. we approximate SW W ≈ AW W as described earlier to obtain SEl El ≈ AEl El . It is easily verified that the edge matrix AEl El will be well conditioned and may effectively be replaced by a suitably scaled multiple of the identity matrix: SEl El ≈ h σEl IEl .   As with the other matrix Schwarz preconditioners for S. the submatrices SFi Fi . the scaling factor must be h−2 instead of h. where σEl represents the average of the coefficients a(·) in the subdomains adjacent to edge El . The submatrix AVV will also be diagonal.2 Vertex Space Preconditioner for S The different variants of the block Jacobi preconditioner are nonoptimal. 3. incorporates some of this coupling by including . preconditioner (3. This arises due to the elimination of the off diagonal blocks in S. S) ≤ C1 (1 + log(h0 /h))2 . Approximation of S 0 . Remark 3. again we em- ploy the approximation SW W ≈ AW W to obtain SVV ≈ AVV . while C2 is independent of h0 and h.

Additionally. respectively. 3. SEi Ei = REi SRTEi and SGk Gk = RGk SRTGk are submatrices of S corresponding to indices of nodes on Fl .168 3 Schur Complement and Iterative Substructuring Algorithms subspace correction terms on overlapping globs containing segments of faces adjacent to each vertex vl and to each edge El . see [SM3. Ei and Gk . Ei and Gk . Ei and Gr : B = (F1 ∪ · · · ∪ Fq ) ∪ (E1 ∪ · · · ∪ Er ) ∪ (G1 ∪ · · · ∪ Gn0 ).5. for 1 ≤ k ≤ n0 . yielding improved bounds. we define the restriction maps RFl . We outline below such approximations. The action M −1 of the vertex space preconditioner is then:  q  r  n0 M −1 = RTFl SF−1 l Fl RFl + RTEi SE−1 i Ei REi + −1 RTGk SG k Gk RGk +RT0 S0−1 R0 . MA38]. The overlapping decomposition of B employed in the vertex space preconditioner can be expressed in terms of Fl . The three dimensional vertex space preconditioner is based on an overlap- ping extension of the following partition of B: B = (F1 ∪ · · · ∪ Fq ) ∪ (E1 ∪ · · · . The action of SE−1l El on a vector rEl can be approx- imated as follows. Approximation of S F l F l . Approximation of S El El . a coarse space correction term based on a coarse space is em- ployed. Such restriction maps are defined by (3. Corresponding to each glob Fl . see Fig. the matrices SFl Fl .19) with zero-one entries so that SFl Fl = RFl SRTFl . Er ) ∪ (v1 ∪ · · · ∪ vn0 ). 3. Similarly.5 for segments of El within a subdomain Ωi . A section of glob Gk restricted to subdomain Ωi is illustrated in Fig. partition . a cylindrical subdomain ΩEl ⊃ El of width O(h0 ) is employed to define: El ≡ ΩEl ∩ B.80) As with the other matrix Schwarz preconditioners for S. We shall omit further discussion of it here. for 1 ≤ l ≤ r. SEi Ei SGk Gk and S0 must be approximated without explicit construction of S. Formally. Additionally. Formally. each vertex vk is extended to a glob Gk of width O(h0 ) containing segments of all faces adjacent to vertex vk . The action of SF−1 l Fl on a vector can either be com- puted exactly or approximately. Given the domain ΩEl such that El = B ∩ ΩEl . as described for block Jacobi preconditioners. R0 will denote a coarse space matrix defined by (3. REi and RGk which restrict a vector of nodal values on B to the nodes on Fl . l=1 i=1 k=1 (3.74). Each edge El is extended to a glob El which includes segments of all faces adjacent to this edge. a domain Ωvk ⊃ vk of size O(h0 ) centered about vertex vk is employed to define glob Gk : Gk ≡ B ∩ Ωv k . Ei and Gk .

DR10]. −1 Approximation of S Gk Gk .6. S) ≤ C2 (β) . DR10]. the action of SG k Gk may be approximated as:   −1   −1 0 AHk Hk AHk Gk 0 SG k Gk rGk ≈ . These preconditioners are typically . the action of SE−1 l El may be approximated as:   −1   −1 0 ADl Dl ADl El 0 SEl El rEl ≈ .51. −1 Then. SM2. Proof. Let A(Ωvk ) denote the submatrix of corresponding to nodes in Hk and Gk . h and a(·). based on the wirebasket region of the interface. Let A(ΩEl ) denote the submatrix of A corresponding to indices of nodes in Dl and El . but depending on the coefficients a(·) such that: cond(M. The coarse space matrix S0 = R0 SRT0 can be ap- proximated by coarse grid stiffness matrix A0 as in the two dimensional case. MA12. DR3. Approximation of S 0 . h where C2 > 0 is independent of h0 . 3. The rate of convergence of the vertex space preconditioner will be of optimal order provided the globs {El } and {Gk } have sufficient overlap of size β h0 when the coefficients a(·) is smooth. Then. S) ≤ C1 1 + log2 (β −1 ) . Partition the nodes in Ωvk based on Hk ≡ Ωvk \Gk and Gk . There exists C1 > 0 independent of h0 and h.6 Preconditioners in Three Dimensions 169 the nodes in ΩEl into Dl ≡ ΩEl \El and El . I ATHk Gk AGk Gk rGk Alternative matrix approximations of SGk Gk may be constructed based on extensions of the probing technique or inexact Cholesky decomposition. but has large jumps across subdomains. Theorem 3. BR15. The action of SG k Gk on a vector rGk can be approximated as follows. If the coefficient a(·) is constant on each subdomain. I ATDl El AEl El rEl Alternative approximations of SEk Ek can be constructed based on extensions of the probing technique or based on inexact Cholesky factorizations. then the above bound deteriorates to: h0 cond(M. See [SM. Let Ωvk denote a domain of width O(h0 ) such that Gk = B ∩ Ωvk .   3.3 A Parallel Wirebasket Preconditioner for S Wirebasket methods for the Schur complement S are preconditioners which employ special coarse spaces [BR14.

if xi ∈ W IW vW i =  (3. It can thus be verified that its transpose IW satisfies: .76).81) i=1 If nW and nB denote the number of nodes on the wirebasket region W and interface B. Due to a weaker discrete Sobolev inequality holding for traditional coarse spaces in three dimensions. We first define the extension map IW T .82) j:xj ∈∂Fl (vW )j . Typically however. Once the coarse space given by Range IW has been defined. the wirebasket extension map IW T is defined as the following nB × nW matrix: T (vW )i . where RW is a point- wise nodal restriction matrix with zero-one entries. then IW will be a matrix of size nW × nB and SW B will be a symmetric T positive definite matrix of size nW . a suitable matrix approximation MW B ≈ SM B ≡ IW S IW T must also be specified. respectively. 1 n∂F l where xi is a node on W with index i in the local ordering of nodes on B.170 3 Schur Complement and Iterative Substructuring Algorithms formulated to yield robust convergence in the presence of large jump discontinuities in the coefficient a(x). the parallel wirebasket pre- −1 conditioner employs a coarse space correction term of the form IW T SW B IW based on a weighted restriction matrix IW whose rows span the wirebasket coarse space. improved bounds can be obtained. Then. wirebasket coarse spaces help transfer information globally between different subdomains. Once IW is defined. The parallel wirebasket preconditioner [SM2] we describe has the form of a matrix additive Schwarz preconditioner for S. with rates of convergence that compare favorably with those for the block Jacobi and vertex space preconditioners. it is based on a partition of the interface into faces and the wirebasket: B = F1 ∪ · · · ∪ Fq ∪ W.76) by the wirebasket coarse −1 space correction term IW SW B IW T where SW B ≡ IW SIW T :  q M −1 = RTFi SF−1 i Fi T −1 RFi + IW S W B IW . but involve sig- nificantly more unknowns. With the use of an appropriately chosen wirebasket coarse space. −1 However. Like the preconditioner (3. unlike (3. By definition. theoretical bounds for the latter two precondi- tioners deteriorate in the presence of large jump discontinuities in a(x). an efficient algebraic solver can be formulated to solve the resulting coarse problems. Let ∂Fl ⊂ W denote the boundary segment of face Fl and let n∂Fl denote the number of nodes on ∂Fl . Like traditional coarse spaces. to ensure that linear systems of the form MW B uW = rW can be solved efficiently within the wirebasket preconditioner. the extension IW T vW i equals the average nodal value of vW on ∂Fl when node xi ∈ Fl . if xi ∈ Fl . We shall describe IW T and MW B in the following.76) which employs a local correction term RTW SW W RW corresponding to the nodes on the wirebasket region W . the wirebasket preconditioner is obtained −1 by formally replacing the term RTW SW W RW in (3. (3.

To construct a heuristic approximation MW B of SW B . Indeed.84) yields:  p (i) SW B = RTW (i) W SW B RW (i) W . . and so will not be described further.83) n∂Fk {k:xi ∈∂Fk } {j:xj ∈Fk } which yields a weighted combination of the nodal values of vB on B.84) i=1 Using definition (3. (3.85) where EB (i) W (i) vW (i) is defined next. (3. if xk ∈ W (i) (EB (i) W (i) vW (i) )k ≡  1 n∂Fl j:xj ∈∂Fl (vW (i) )j . (vW (i) )k . it can be verified that the extension (interpolation) map IWT acts locally on each subdomain boundary. This expresses SW B as a sum of local (i) (i) contributions. Given a local approximation MW B of SW B . 3. (3. the matrices SFl Fl = RFl SRTFl and SW B ≡ IW SIWT must be approximated in practice. (3. A symmetric positive definite approximation MW B of SW B and an associated algebraic solver for linear systems of the form MW B vW = rW will be formulated in the remainder of this subsection. an approximation (i) (i) MW B of SW B can be constructed by replacing SW B by MW B in (3.82). Since the Schur complement S is not assembled in iterative substructuring. Remark 3. Symmetric positive definite approximations of the submatrices SFl Fl of S have already been described in the section on block Jacobi preconditioners. we consider the subassembly identity for the Schur complement matrix:  p (i)T (i) S= RB S (i) RB .52. i=1 Substituting this identity into SW B = IW SIW T yields:  p (i)T (i) SW B = IW RB S (i) RB IW T .86) i=1 (i) where SW B ≡ EB T (i) W (i) S (i) EB (i) W (i) .86). Substituting this into (3. if xk ∈ Fl ⊂ B (i) . Thus EB (i) W (i) RW (i) W = RB (i) IW T . yielding the following identity on each boundary B (i) : (i) EB (i) W (i) vW (i) = RB IW T vW .6 Preconditioners in Three Dimensions 171   (vB )j (IW vB )i = (vB )i + . the nodal values of IW vW on each subdomain boundary B (i) can be expressed solely in terms of T the nodal values of vW on the wirebasket W (i) .

However. when T c(. .) = 0 and a(x) ≡ a on Ωi and Ωi is immersed. . . . then the local Schur complement S (i) (and consequently SW B ) will scale in proportion to coefficient a(i) . (i) Employing these heuristic observations. In particular.87) zTW (i) D(i) zW (i) zTW (i) zW (i) corresponding to a D(i) -orthogonal projection onto the null space span(zW (i) ) (i) (i) of SW B . Then. This yields the choice of MW B as: (i) MW B = (I − Pi )T D(i) (I − Pi ) = β a(i) (I − Pi ).1). when c(. we will require that (i) (i) each MW B be spectrally equivalent to SW B independent of a(·). Combining the preceding observations yields a global approximation MW B ≈ SW B based on the local approximations (i) (i) MW B ≈ SW B as: . As a (i) consequence. . The fol- lowing heuristic observations will be employed when a(·) is piecewise con- stant. . then S (i) (and also SW B ) will be singular. T where vector (1.88) ωi where ωi is a parameter chosen to minimize the above expression. we shall post-multiply and pre-multiply (i) matrix D(i) and define MW B = (I − Pi )T D(i) (I − Pi ) where Pi is defined as: zW (i) zTW (i) D(i) zW (i) zTW (i) Pi ≡ = . (i) Matrix MW B may also be equivalently characterized by the requirement: (i) T vTW (i) MW B vW (i) = min (vW (i) − ωi zW (i) ) D(i) (vW (i) − ωi zW (i) ). (3. Secondly. 1) (i) (i) denote a vector of size nW (i) corresponding to the number of nodes on W . if Ωi is immersed in Ω. . Theoretical analysis [SM2. its extension EB (i) W (i) zW (i) of size nB (i) will satisfy: T EB (i) W (i) zW (i) = (1. . B (i) = ∂Ωi .) = 0 in elliptic equation (3. and a(x) ≡ a(i) on (i) each Ωi . (i) i. Thirdly. SW B will be singular when S (i) is singular. (3. 1) .. .172 3 Schur Complement and Iterative Substructuring Algorithms (i) (i) To construct an approximation MW B of SW B so that MW B is spectrally equivalent to SW B independent of the coefficient a(·). This can easily be verified. DR10] suggests choosing the scaling factor as β = h (1 + log(h0 /h)). . .e. since SW B will scale in proportion to coefficient a(i) . we may seek to approximate SW B by a scalar multiple D(i) = β a(i) I of the identity matrix of size nW (i) for a (i) (i) scaling factor β > 0 to be specified. and zW (i) will span (i) its null space. let zW (i) ≡ (1. . Firstly. to ensure that SW B and MW B also both have the same null spaces. it (i) will be necessary to choose MW B also proportional to a(i) to ensure spectral (i) (i) equivalence between MW B and SW B independent of {a(l) }. 1) of size nB (i) generates the null space of S (i) .

3.90) = min(ω1 . As a result. .ωp ) i=1 (vW (i) − ωi zW (i) ) D(i) (vW (i) − ωi zW (i) ).89) and (3.89) = i=1 RTW (i) W (I − Pi )T D(i) (I − Pi )RW (i) W . (vW .54. In this case zW (i) will need to be replaced by a matrix whose columns are restrictions to W (i) of a basis for the null space of S (i) .91) vW where J(uW ) ≡ 1 T 2 vW MW B vW − vTW rW is its associated energy. ωp ) p T ≡ 1 2 i=1 (vW (i) − ωi zW (i) ) D(i) (vW (i) − ωi zW (i) ) − vTW rW . Remark 3. . .. and will also be positive semidefinite since vTW MW B vW is a sum of nonnegative quadratic forms.88) as satisfying: vTW MW B vW p T (3. .55. We now describe an algebraic solver for MW B uW = rW . This can be verified to hold for nonzero αi only if SW B is singular. By construction. . . where vW (i) = RW (i) W vW . the solution uW to the linear system MW W uW = rW will also solve the following minimization problem: J(uW ) = min J(vW ). . Remark 3. ωp ). (3.ω1 . . .6 Preconditioners in Three Dimensions 173 p (i) MW B = i=1 RTW (i) W MW B RW (i) W p (3.87). where D(i) = h (1 + log(h0 /h)) a(i) I. matrix MW B is symmetric. . . Matrix MW B may also be equivalently characterized using (3. for 1 ≤ i ≤ p.53. .. Since MW B will be a symmetric and positive definite matrix. ω1 . This alternative expression will be useful in con- structing an efficient solver for linear systems of the form MW B vW = rW . the null space of S (i) may have several linearly independent vectors.. ωp∗ = min J˜ (vW . A vector vW will belong to the null space of MW B only if: MW B vW = 0 ⇔ RW (i) W vW = αi zW (i) . MW B will be positive definite whenever SW B is positive definite. .91) will thus also be equivalent to: J˜ uW . ω1∗ . and Pi is defined by (3.··· ..ωp ) where J˜ (vW . Remark 3. The minimization of (3. For elliptic systems such as the equations of linear elasticity. J(vW ) ≡ 12 vTW MW B vW − vTW rW p T = 12 i=1 minωi (RW (i) W vW − ωi zW (i) ) D(i) (RW (i) W vW − ωi zW (i) ) − vTW rW . ω1 .

i=1 with diagonal entries:  (DW B )ii = a(k) h (1 + log(h0 /h)). . . . We summarize the implementation of the parallel wirebasket preconditioner for S. p ∗ T (3. .92): ⎡ ⎤⎡ ∗⎤ ⎡ ⎤ K11 · · · K1p ω1 g1 ⎢ . (3. (i) i=1 A reduced system can thus be obtained for the parameters ω1∗ .174 3 Schur Complement and Iterative Substructuring Algorithms Applying the first order derivative conditions for a minimum (differentiating the above expression with respect to vW and ω1 . ⎥ = ⎢ .. ωp∗ .93) Matrix K can be verified to be symmetric and sparse.92). (i) where DW B is the following diagonal matrix of size nW :  p DW B ≡ RTW (i) W D(i) RW (i) W . T (i) ⎪ ⎪ ⎩ −1 gi ≡ zTW (i) D(i) RW (i) W DW B rW .. ⎦ ⎣ . For each choice of parameters ω1∗ . ⎦⎣ . . ⎥ ⎢ . for i = j ⎨ T −1 Kii ≡ zW (i) D zW (i) − zW (i) D RW (i) W DW (i) T (i) B RW (i) W D zW (j) . ⎣ .92): !  p −1 ∗ T uW = DW B rW + ωi RW (i) W D zW (i) . {k:vi ∈B (k) } An efficient solver for MW B uW = rW can be formulated by solving (3.92) DW B uW − i=1 ωi RW (i) W D zW (i) = rW . ωp∗ by substituting the preceding expression for uW into the first block row in (3. ⎧ −1 ⎪ ⎪ Kij ≡ −zTW (i) D(i) RW (i) W DW B RW (j) W D T (j) zW (j) .. and the preceding linear system can be solved using any suitable sparse direct solver. . ωp and requiring it to equal zero) yields the following system of equations: zTW (i) D(i) (RW (i) W uW − ωi∗ zW (i) ) = 0. . . . ⎦ K1p · · · Kpp ωp∗ gp where the entries Kij and gi are defined as follows. . . the vector unknown uW can be determined by solving the second block row in (3. ⎥. . for 1 ≤ i ≤ p. .. . .

The solution to MW B uW = IW rB can be computed as fol- lows. ⎥.6. DR10]..57. including an algorithm with condition number (1 + log(h0 /h)). . The following result concerns the convergence rate of the preceding parallel wirebasket algorithm. . 3. . solve for uW : ! p DW B uW = rW + ωi RTW (i) W D(i) zW (i) . See [SM2.1 (Wirebasket Preconditioner)  q M −1 rB ≡ RTFk SF−1 k Fk RFk rB + IW T −1 MW B IW rB . h and a(·) such that cond(M. Theorem 3.   Remark 3. ωp∗ : ⎡ ⎤⎡ ∗⎤ ⎡ ⎤ K11 · · · K1p ω1 g1 ⎢ . Alternate wirebasket algorithms are described in [BR15. .. these preconditioners solve a Neumann problem on each subdomain. ⎦ K1p · · · Kpp ωp∗ gp where the entries Kij and gj are defined in (3.93). ⎥ = ⎢ . ⎦ ⎣ . i=1 The terms SF−1 k Fk RFk rB can be computed as described for the block Jacobi preconditioner.. S) ≤ C(1 + log(h0 /h))2 . without the requirement that the subdomains be boxes or tetrahedra.7 Neumann-Neumann and Balancing Preconditioners Neumann-Neumann and balancing domain decomposition methods are a widely used family of preconditioners for multisubdomain Schur complement matrices in two and three dimensions. If the coefficient a(·) is constant within each subdomain. Furthermore. The heuristic approximation MW B of SW B assumed that the coefficient c(x) = 0. 3.. . ⎦⎣ . ⎣ .7 Neumann-Neumann and Balancing Preconditioners 175 Algorithm 3. omitting such terms will remove the mechanism for global transfer of information. Secondly. Firstly. Proof. ⎥ ⎢ . such preconditioners have an algebraic form that may be applied to arbitrary subdomain geometries in two or three dimen- sions. there exists C > 0 independent of h0 . In practice the same matrix MW B described above (based (i) on the vectors zW (i) ) can be employed even when c(x) = 0 though SW B will not be singular in such a case. DR10]. . i=1 This yields uW . and hence the name. MA12. From a computational viewpoint. Indeed. using rW ≡ IW rB solve for ω1∗ . .56.

independent of the jump discontinuities in the coefficient. MA17]. Given a decomposition Ω1 . If no coarse space is employed. a Neumann-Neumann preconditioner has the structure of an additive Schwarz preconditioner for S. . (3. Instead the following may be noted. Different coarse spaces facilitating global transfer of information are also employed in each preconditioner. -T  (i) (i) −1 . terms of the form S (i) rB (i) can be computed by solving the linear system S (i) wB (i) = rB (i) corresponding to a discrete Neumann problem on Ωi : . based on the decomposition of B into the overlapping boundary segments B (1) . the preconditioner has the form:  p † M −1 = RTB (i) S (i) RB (i) . DR14. SB (i) B (i) ≡ RB (i) SRTB (i) submatrix of S is approximated by the subdomain Schur complement S (i) . DE2. . Remark 3. defined in (3. .7.176 3 Schur Complement and Iterative Substructuring Algorithms Theoretical analysis indicates that these methods precondition effectively. . with restriction and extension matrices RB (i) and RTB (i) re- spectively. We also outline an algebraic preconditioner [CA33] based on the Neumann-Neumann preconditioner. while the balancing domain decomposition preconditioner has the structure of a hybrid Schwarz preconditioner for S. 3. - (i) −1 0 AII AIB 0 S rB (i) = (i)T (i) . In this case.95) i=1 † where S (i) denotes the Moore-Penrose pseudoinverse [GO4] of the local Schur complement matrix S (i) . where B (i) ≡ ∂Ωi \BD . . B (p) . It has the formal structure of an additive Schwarz subspace preconditioner for S. DR18. LE. 3. DE3.19). Ωp of Ω. and the balancing domain decomposition preconditioner [MA14. yielding condition number bounds which grow polylogarithmic in the mesh parameters.1 Neumann-Neumann Preconditioners Multi-subdomain Neumann-Neumann preconditioners are extensions of the two subdomain Neumann-Neumann preconditioner from Chap. both preconditioners decompose the interface B into the segments: B = B (1) ∪ · · · ∪ B (p) . From the viewpoint of Schwarz subspace methods. (3. . DR16. Our discussion will focus on the family of Neumann-Neumann preconditioners [BO7.94) Both preconditioners employ the subdomain Schur complement matrix S (i) to approximate the unassembled submatrix SB (i) B (i) = RB (i) SRTB (i) of S. then S (i) = S (i) . Since S is not assembled. the local Schur complement S (i) need not be assembled. I AIB ABB rB (i) . When matrix S (i) † −1 † is nonsingular. . unlike SB (i) B (i) . since S (i) can be singular.58.4. corresponding to the nodes on B (i) . LE5]. . In practical implementation.

1T 1 Alternatively. if a local problem is singular. . we omit a coarse space correction term. elegantly addresses the issue arising with singular local problems and its non-unique solution.7 Neumann-Neumann and Balancing Preconditioners 177 However. MA17]. p in parallel solve:    . As a result. We shall next describe a Neumann-Neumann preconditioner employing an algebraic partition of unity. then matrices A(i) and S (i) will be singular. When this compatibility condition is satisfied. As noted earlier. - (i) (i) (i) AII AIB wI 0 = . described later. For convenience. Algorithm 3. the linear system S (i) wB (i) = rB (i) will be solvable only if rB (i) satisfies the compatability condition: 1T rB (i) = 0. RB (i) rB T (i) (i) (i) AIB ABB wB 2. 1. However. as follows. as any scalar multiple of 1 may be added to it.7. the Neumann- Neumann preconditioner adds duplicates of the solution on the regions of .. 1) of appropriate sizes. . the null space of A(i) and T S (i) will be spanned by vectors of the form 1 = (1. zero or “small” pivots can be set to a prescribed nonzero number  > 0. Endfor p Output: M −1 rB ≡ (i) i=1 RTB (i) wB . 3. yielding an approximate solution satisfying 1T wB (i) = 0. a solution wB (i) will exist.e. To motivate this version of the preconditioner. the local Cholesky factorization can be modified. . For i = 1. If direct solvers are em- T ployed.1 (Neumann-Neumann Preconditioner-No Coarse Space) Given rB the vector M −1 rB is computed as follows. We summarize the algorithm below assuming nonsingular subproblems. the action of S (i) on a vector it typically approximated in Neumann-Neumann algorithms [DE3]. and this approximate factorization can be employed to formally † compute w ˜ B (i) ≈ S (i) rB (i) . though it will not be unique. then when the Cholesky factorization L(i) L(i) of A(i) is computed on each subdomain. this approximate solution w ˜ B (i) may then be projected onto the orthogonal complement of the null space:  T  1 w˜ B (i) wB (i) ≡ w ˜ B (i) − 1. the balancing domain decomposition preconditioner [MA14.59. . when c(x) = 0 and Ωi is immersed inside Ω (i. · · · . If desired. Remark 3. though it may be added. note that because of overlap between adjacent boundaries B (i) . In this case. a projected gradient method may be used to iteratively solve S (i) wB (i) = rB (i) . † When S (i) is singular. B (i) = ∂Ωi ).

Such duplication can be reduced by employing a discrete partition of unity on B subordinate to the subdomain boundaries B (1) . . B (p) .96) l=1 Various choices of such diagonal matrices exist. For each 1 ≤ l ≤ p (l) let D(l) denote a diagonal matrix of size nB (l) with nonnegative entries so that a discrete partition (decomposition) of the identity matrix is obtained:  p RTB (l) D(l) RB (l) = I. . . Accord- (l) ingly. (3. The diagonal entries of D(l) is also commonly defined based on the coefficient a(x): . let nB (l) denote the number of nodes on B (l) for 1 ≤ l ≤ p and let xi for 1 ≤ i ≤ nB (l) denote an ordering of the nodes on B .178 3 Schur Complement and Iterative Substructuring Algorithms overlap. .

then the following condition number bound will hold: cond(M.e.95) or in (3.97) (l) (a(j) )ρ ii j:x ∈B (j) i where 0 ≤ ρ ≤ 1 denotes some user chosen scaling factor and a(l) denotes some sample value of coefficient a(x) in Ωl . To ensure that the preconditioner is symmetric. i. (a(l) )ρ D(l) =  (3. where deg(xi ) denotes the degree of (l) node xi .   . See [DE3.98) i=1 where we have omitted a coarse space correction term. Lemma 3.98). the above def- (l) (l) inition yields D(l) ii = 1/deg(xi ). the number of distinct subdomain boundaries B (j) to which (l) node xi belongs to. The parti- tion of unity Neumann-Neumann preconditioner can now be formulated as:  p † M −1 rB = T RTB (i) D(i) S (i) D(i) RB (i) rB (3. The following bounds will hold for the standard and partition of unity versions of the Neumann-Neumann preconditioner without a coarse space. If M denotes the preconditioner in (3. Pre- conditioner (3. where C > 0 is independent of h and h0 . When a(x) ≡ 1. S) ≤ C h−2 0 1 + log(h0 /h)2 . Such a discrete partition of the identity on B can be p to distribute an interface load rB to the subdomain boundaries employed rB = i=1 RTB (i) D(i) RB (i) rB so that the load is not duplicated. each matrix D(i) has been employed twice.60. Proof.. DR18].98) corresponds to a matrix additive Schwarz preconditioner for T S based on the subspaces Range(RTB (i) D(i) ) for 1 ≤ i ≤ p with the matrices T S (i) approximating D(i) RB (i) SRTB (i) D(i) .

see also [GL14. S) ≤ C 1 + log(h0 /h)2 . . For brevity. The Neumann-Neumann preconditioner with coarse space correction can be implemented in parallel using (3. Ωp correspond to elements in a coarse triangulation (0) Th0 (Ω) of size h0 .7 Neumann-Neumann and Balancing Preconditioners 179 To improve the convergence rate of the preceding Neumann-Neumann algorithms as the subdomain size h0 decreases. .. a coarse space correction term can be included. . However.100) with the subdomain problems solved as in Alg.2 Balancing Domain Decomposition Preconditioner The balancing domain decomposition preconditioner [MA14. then the coarse space matrix RT0 is: ⎡ ⎤ (h ) (h ) φ1 0 (x1 ) · · · φn00 (x1 ) ⎢. and let (h0 ) (h ) (0) φl (x) denote the coarse space nodal basis satisfying φl 0 (yj ) = δij . FA16]. h0 and {a(l) }. T (3. let yl denote the coarse nodes for 1 ≤ l ≤ n0 . . 3. DR18]. the procedure eliminates arbitrariness in the output of the Neumann-Neumann preconditioner.1 may be employed. if the subdomains Ω1 . See [DE3. If x1 . thereby providing some global exchange of information. . . ⎦ (3.. Proof. the coarse ma- trix S0 may be approximated by the coarse grid discretization A0 of (3.99) (h0 ) (h ) φ1 (xnB ) · · · φn00 (xnB ) A coarse space version of the Neumann-Neumann preconditioner can now be obtained by including the correction term RT0 S0−1 R0 with S0 = R0 SRT0 :  p † M −1 rB = RTB (i) D(i) S (i) D(i) RB (i) rB + R0 S0−1 R0 rB . ⎥. Additionally.100) i=1 As with other matrix additive Schwarz preconditioners for S.1. . MA17] for the Schur complement S. . xnB denotes the nodes on B. employs an algebraic procedure referred to as balancing.1). . 2. arising from non-unique subdomain solutions. then the condition number of the partition of unity Neumann-Neumann pre- conditioner with coarse space correction will satisfy: cond(M.7. Any coarse space from Chap. 3. Lemma 3. where C > 0 is independent of h. which ensures that each singular subdomain problem arising in the Neumann- Neumann preconditioner is solvable. If coefficient a(x) satisfies a(x) = a(i) on each subdomain Ωi . . we shall not summarize the resulting algorithm.61. 3.7. in principle. . ⎥ RT0 = ⎢ ⎣ . and provides a natural coarse space which transfers information globally.

when vector rB is balanced. which will be described in the following. Kernel(S (l) ) = Range(N For instance when c(x) > 0.101) l=1 When c(x) = 0 and Ωl is floating in Ω.105) which describes a balanced vector can be . then N ˜l will be a matrix of size n(l) × d˜l .101) will be solvable. NlT D(l) RB (l) rB = 0. By construction. so that Range(N ˜l ) = Kernel(S (l) ). (3.102) will be solvable only if the following compatibility condition holds: ˜ T D(l) RB (l) rB = 0. The methodology will be illustrated for balancing the discrete partition of unity version of the Neumann-Neumann preconditioner:  p † M −1 rB = RTB (l) D(l) S (l) D(l) RB (l) rB . each system S (l) wB = D(l) RB (l) rB will be solvable. The balancing procedure will employ a more (l) (l) general matrix Nl of size nB × dl with dl ≥ d˜l such that: ˜l ) ⊂ Range(Nl ).105) (l) In this case. each subproblem (l) S wB = D(l) RB (l) rB in (3. If n(l) denotes the size of S (l) and d˜l the dimension B of the null space of S (l) . When the B (l) matrix S is singular. wB = vB + N (3.104) (l) ˜l αl represents a general term in the where vB is a particular solution. When rB is not balanced.62.103) l When (3.102) will be consistent (even if Nl = N Definition 3.102) will be: (l) (l) ˜l αl . Equation (3. and N d˜l null space of S for αl ∈ IR .180 3 Schur Complement and Iterative Substructuring Algorithms We shall heuristically motivate the balancing procedure. but it may be advantageous to choose Nl as the matrix whose columns span the null space of the local Schur complement associated with c(x) = 0. before outlining its implementation. the general solution to (3. (3. for 1 ≤ l ≤ p.103) holds. matrix S (l) will be singular. N (3. the subdomain problem: (l) S (l) wB = D(l) RB (l) rB . (l) it may be modified by subtracting a correction term P0 rB so that (I − P0 ) rB is balanced. (3. matrix S (l) will be nonsingular. where P0 is an S-orthogonal projection. Let N˜l denote a matrix whose columns form a basis for Kernel(S (l) ). A vector rB ∈ IRnB will be said to be balanced if: NlT D(l) RB (l) rB = 0. then system (3. if (l) ˜l ). By the preceding definition.

Motivated by the preceding. The correction term S C α may then be represented as: −1 T P0 rB ≡ S C α = S C C T SC C rB . so that (rB − S C α) is balanced. αTp ∈ IRd : T C SC α = C T rB . where P0 rB can be easily verified to be an S-orthogonal projection of rB onto the column space of C (with P0 P0 = P0 and P0 S = S P0T ). 3. the subproblems are solvable (but with non-unique solutions). T where the columns of C consists of the columns of RTB (l) D(l) Nl for 1 ≤ l ≤ p:   T T C = RTB (1) D(1) N1 · · · RTB (p) D(p) Np . equation (3. . the output of the Neumann-Neumann preconditioner is subsequently balanced by another application of the (I − P0 ) in a post-processing step. . . which is the S-orthogonal complement of the space Kernel(C T ) of balanced vectors. To ensure sym- metry. Computing the action M −1 rB of the inverse of the hybrid Schwarz pre- conditioner M in (3. Then. This yields the following linear system of T equations for determining α = αT1 . this system will be uniquely solvable by positive definiteness of S. When C T rB = 0.107) l=1 The first application of (I − P0 ) ensures that the residual is balanced so that when the partition of unity Neumann-Neumann preconditioner is applied. solve: (C T SC)α = C T rB .7 Neumann-Neumann and Balancing Preconditioners 181 compactly represented using a matrix C of size nB × d for d = (d1 + · · · + dp ). the balancing domain decomposition precon- ditioner M employs the structure of a hybrid Schwarz preconditioner: ! p † M −1 S = P0 + (I − P0 ) RTB (l) D(l) S (l) D(l) RB (l) rB (I − P0 ). (3. (3. In the first step.105) for a balanced vector rB becomes: C T rB = 0.107) involves three steps. the term P0 is employed to compute the projection of the solution onto the coarse space V0 = Kernel(C T )⊥ . a correction term (S C α) may be sought for α ∈ IRd : C T (rB − S C α) = 0. . Since this output will lie in the subspace Kernel(C T ) of balanced vectors. .106) When C is of full rank.

In most applications. γ p . matrix K can be singular. .. . (3. the partition of unity Neumann-Neumann preconditioner is formally applied to the balanced residual ˜rB : p  † vB = RTB (l) D(l) S (l) D(l) RB (l) ˜rB . ⎦⎣ . Remark 3. .63. . l=1 for some choice of coefficient vectors γ 1 . ⎥ = ⎢ . . ⎥ ⎣ . ⎦. In the second step. αp . . . . K will be symmetric and positive definite. αTp and the block structure of C into (3.106) to yield the following block partitioned linear system: ⎡ ⎤⎡ ⎤ ⎡ T (1) ⎤ K11 · · · K1p α1 N1 D RB (1) rB ⎢ . Below. .109) and αi ∈ IRdi . . j ≤ p. for 1 ≤ i. (3. . some care must be exercised when extending each matrix N ˜l to Nl . Using S C α. However.108) T K1p · · · Kpp αp NpT Dp RB (p) rB involving (d1 + · · · + dp ) unknowns corresponding to the subvectors α1 .. System (3. when C is not of full rank. Then M −1 rB ≡ (S C α + vB − S C β). If di = 0 for any index i. . l=1 ˜ B = (I − P0 ) vB requires solving the system: In the third step. . then the corresponding block rows and columns of K and α should be omitted. .64. .106) has a block structure which can be obtained by T substituting the block structure α = αT1 .182 3 Schur Complement and Iterative Substructuring Algorithms If rB = S uB . we summarize the action of the inverse of the balancing domain de- composition preconditioner. ⎥ ⎢ . a balanced residual ˜rB is constructed from rB by subtraction of the term S C α: ˜rB = rB − S C α. Here. In this case. this yields C α = P0 uB . and defining v Remark 3. ⎦ ⎣ . the columns of C will be linearly dependent with: p  RTB (l) D(l) Nl γ l = 0. the block submatrices Kij will be di × dj matrices defined by: T Kij ≡ NiT D(i) RB (i) SRTB (j) D(j) Nj . To avoid a singular matrix K.. to obtain v (C T SC)β = C T vB .. ˜ B = (vB − S C β).

. 3. j=1 Output: M −1 rB ≡ w∗B + wB + v∗B . ⎥ ⎢ . 1.7 Neumann-Neumann and Balancing Preconditioners 183 Algorithm 3. ⎦⎣ . For i = 1.. ⎦. ⎦. Define: p w∗B ≡ T j=1 RTB (j) D(j) Nj αj r∗B ≡ rB − Sw∗B . Motivated by this.2 (Balancing Domain Decomposition Preconditioner) Input: rB .. The following convergence bound will hold for the balanced domain decomposition preconditioner. ⎦⎣ . Endfor 5. then step 1 can be omitted in the preconditioner.. in prac- tice steps 1 and 2 are employed in a pre-processing stage to ensure that the initial residual is balanced. . Define:  p v∗B ≡ T RTB (j) D(j) Nj β j . K1p · · · Kpp T αp Np D RB (p) rB T (p) 2. ⎥ ⎣ . 3. the computational cost of each iteration will be proportional to the cost of two subdomain solves on each subdomain and the cost of balancing (which re- quires the solution of a coarse problem P0 ). . ⎥ = ⎢ .. Solve: ⎡ ⎤⎡ ⎤ ⎡ T (1) ⎤ K11 · · · K1p α1 N1 D RB (1) rB ⎢ . Solve: ⎡ ⎤ ⎡ ⎤ ⎡ T (1) ⎤ K11 · · · K1p β1 N1 D RB (1) tB ⎢ . Compute: p T wB = j=1 RTB (j) D(j) wB (j) tB = r∗B − SwB . In this case.. ⎥ ⎣ .. ⎦ ⎣ . ⎦ ⎣ . . Remark 3. · · · .. 6. .7. K1p · · · Kpp T βp Np D RB (p) tB T (p) 7. Then.65. p in parallel solve: S (i) wB (i) = D(i) RB (i) r∗B . steps 1 and 2 can be omitted in all subse- quent applications of M −1 in the CG algorithm.. Thus. ⎥ ⎢ . Each iteration will require one matrix multiplication with S and one multiplication by M −1 . yielding w∗B = 0. the output M −1 rB = wB + v∗B will also be balanced. If the input rB to the preconditioner is balanced. 4. ⎥ = ⎢ .

such that Kernel(Nl ) corresponds to the null space of S (l) when c(x) = 0 (typically with Nl = Span(1)).   Remark 3. there will be a constant C inde- pendent of h0 . However.3 An Algebraic Preconditioner We conclude this section by outlining an algebraic preconditioner of [CA33]. h and the {a } such that: (i) 2 cond(M.66.7.184 3 Schur Complement and Iterative Substructuring Algorithms Theorem 3. 3. In this case. S) ≤ C (1 + log(h0 /h)) . this can be remedied by choosing a nontrivial matrix Nl = N˜l on each subdomain. If c(x) > 0. . with SB (i) B (i) = RB (i) SRTB (i) and S0 = R0 SRT0 . . B (p) of B:  p −1 T −1 M −1 = RTB (i) SB (i) B (i) RB (i) + R0 S0 R0 . . then each subdomain problem will be nonsingular. Proof. An exact application of the preceding preconditioner requires assembly of the submatrices SB (i) B (i) and the coarse matrix S0 . and the convergence rate of the balancing domain decomposition preconditioner will deteriorate. See [MA14. It approximates the following additive Schwarz preconditioner for S. MA17. . an approximation S˜B (i) B (i) ≈ SB (i) B (i) can be (i) ˜ T of each subdomain ˜ (i) L constructed based on the ILU factorization AII ≈ L (i) −1 stiffness matrix A(i) . if each Nl = N ˜l . DR18]. where M denotes the balancing domain decomposition preconditioner. i=1 Here RB (i) denotes a restriction matrix with zero-one entries corresponding to nodes on B (i) .67. based on the segments B (1) . Suppose that c(x) = 0 and that coefficient a(x) = a(i) on each subdomain Ωi . Then. the coarse space V0 = Kernel(C T )⊥ will be trivial. with A(i) ≈ L ˜ −T L ˜ −1 : (i) (i)  p . However. and R0 denotes the coarse space weighted restriction matrix.

1). (l) (l)T ˜ −T ˜ −1 (l) S˜B (i) B (i) ≡ RB (i) RTB (l) ABB − AIB L (l) L(l) AIB RB (l) RB (i) . Numerical studies indicate attractive convergence properties for such preconditioners [CA33]. and its incomplete factoriza- tion can be found. The coarse matrix S0 may be approximated by a coarse grid discretization A0 of (3. and can be truncated to a sparse matrix. . Unlike the subdomain stiffness matrices S (i) . T l=1 Efficient algorithms for assembling such approximations are described in [CA33]. Matrix S˜B (i) B (i) will be dense. the algebraic approx- imations S˜B (i) B (i) of SB (i) B (i) will not be singular.

. local solvers. Condition number bounds for Schur complement preconditioners Algorithm Mild Coeff.). choice of subdomains.8. FA9.8 Implementational Issues Schur complement algorithms are generally more difficult to implement than Schwarz methods.).1 Choice of Subdomains Various factors influence the choice of a decomposition Ω1 .1. an automated strategy.) has large jumps. storage and communication costs. 3. The condition number bounds of several Schur complement preconditioners are summarized in Table 3. They include. For anisotropic coefficients. the subdomains should ideally be aligned with the discontinuities in a(. to re- duce the variation of a(. h0 and jumps in the coefficient a(. Estimates are presented for the case when the jumps in a(·) are mild. just as Schwarz algorithms. 3. . 2 2 2D BPS C 1+log (h0 /h) C 1+log (h0 /h) 2 −1 2 2D Vertex Space C(a) 1+log (β ) C(β) 1+log (h0 /h) 2 3D Vertex Space C(a) 1+log (β −1 ) C(β)(h 0 /h) 2 3D Wirebasket C 1+log (h0 /h) C 1+log2 (h0 /h) Neumann & Balancing C 1+log2 (h0 /h) C 1+log2 (h0 /h) . where the implementation.8 Implementational Issues 185 3. general boundary conditions. However. C(a) denotes a parameter independent of h0 and h but dependent on the coefficient a(·).1). since more geometric information is required about the subdomains and their boundaries (Neumann-Neumann and balancing pre- conditioners may be exceptions). an effectively preconditioned Schur complement algorithm can converge at almost optimal rates with respect to h. . parallel libraries. For instance. h and a(·). strip like subdomains may be chosen so that the elliptic equation is cou- pled more strongly within the strips. SI2. regularity of the solution. In this section. and to balance the loads [BE14. when a(. while C is independent of h0 . and time stepped problems. may be employed to mini- mize the communication between the subdomains. preconditioning S or A. and when the jumps are large. when the coefficient a(. Table 3. Ωp of Ω.1. may be reduced due to the lack of overlap between the subdomains. we remark on implementational issues in applications of Schur complement algorithms to solve a discretization of (3. 5. the geometry of the domain. Disc Coeff. PO3. BA20.1. availability of fast local solvers. These include. anisotropic problems. FO2. When a natural decomposition is not obvious. see Chap.) within each subdomain. and het- erogeneity of the coefficients. location of the essential and natural boundary.) is constant within each subdomain. and remarks on discontinuous coefficient problems. PO2]. For the vertex space algorithm C(β) depends on the overlap factor β. .

68. each uI will denote a vector of (l) unknowns in Ωl ∪ (∂Ωl ∩ BN ). will be nonsingular. Ωp of Ω. as before. for BD = ∂Ω. Care must be exercised in defining a coarse space when BN = ∅. and it may also be difficult to formulate a traditional coarse space. . . Schwarz subspace preconditioners can be formulated for (l) S. . and it can be decomposed into globs or overlapping segments. since the interface B will be identical to the interface for a Dirichlet problem. given a decomposition of B into globs or overlapping segments. . We then define uI = (uI . How- (l) ever. while B = ∪pl=1 (∂Ωl ∩ Ω) will not (1)T (p)T include the natural boundary BN . . However. .186 3 Schur Complement and Iterative Substructuring Algorithms 3.8. we shall indicate both block partitionings.1) with stiffness matrix A and load vector f . given a decomposition Ω1 . AII . vertex space and wirebasket preconditioners). a discretization of (3. but also in BN . (l) Thus. Since B will include the natural boundary BN . the subdomain matrix AII will involve natural boundary conditions on (∂Ωl ∩ BN ). .1) can be block partitioned as in (3. the nodal unknowns can in principle be block partitioned in two alternate ways. yielding two different Schur com- plement systems. However. In the first block partitioning. However. and remark on the construction of Schur complement preconditioners for a discretization of (3. i.5). (l) Second Case. and the solution will be unknown not only in Ω. and the subdomain matrix AII will only involve interior nodal unknowns in Ωl . Remark 3. . We shall define the “interface” as B = ∪pl=1 ∂Ωl ∩ (Ω ∪ BN ) and let uB denote the unknowns on B. In the following. yielding a Schur complement S = (ABB − ATIB A−1 II AIB ). the triangulation must ideally be chosen so that its elements are aligned with BN . while uB will denote unknowns on (∂Ωl ∩ Ω). and let uB denote the vector of nodal values on B.. uI ) . Then. When more general boundary con- ditions are imposed. . . Schur com- plement preconditioners can be constructed as for a Dirichlet problem. If BN = ∅ and BD = ∅. uI )T . AIB and ABB will have different sizes for each partition. since the coarse space must be a subspace of Vh ∩ HD 1 (Ω). it may be difficult to decompose it into standard globs if BN has an irregular shape. In this case. (l) First Case. each uI will denote un- T T (1) (p) T knowns in Ωl and uI = (uI . using the block vectors uI and uB of unknowns. then stiffness matrix A and the Schur complement matrix S . . In both of the above cases. In this case. This may complicate the formulation of glob based preconditioners (such as block Jacobi.e. if BN = ∂Ω and coeffi- cient c(x) = 0. the natural boundary BN = ∅. In the second block partitioning. .2 General Boundary Conditions Our discussion of Schur complement preconditioners has focused primarily on Dirichlet problems. then stiffness matrix A and the Schur complement matrix S. the unknowns on BN ∩ ∂Ωl will be included in uI though they do not strictly lie in the interior of the subdomain. Neumann-Neumann and balancing methods apply in both cases.

the coarse space matrix S0 = R0 SRT0 will also be singular. To obtain a unique solution. AII ) = 1. . . . This approach will require matrix-vector products with S computed exactly (to machine precision). In this case. then modify it to have mean value zero:  T  1 wB wB ← wB − 1.69.2.. the Schur complement system SuB = ˜f B and the coarse problem S0 w0 = R0 rB will be solvable only if 1T ˜f B = 0 and 1T R0 rB = 0. the solution to (3.1 and a CG algorithm with an appropriate preconditioner for S. . uI can be obtained at the cost of one subdomain solve.8.. the global stiffness matrix A is solved by a precon- ditioned CG algorithm. . see [BO4]. Computing the solution to Au ˜ = f formally requires computing the action of A˜−1 twice. and S˜−1 once.110) I −A˜−1 II AIB I 0 I 0 A˜−1 II 0 = . For instance. (i) (i) . Once uB has been determined. 0 I 0 S˜−1 − ATIB I 0 I where S˜ denotes a preconditioner for S and A˜II a preconditioner for AII . it is important that the submatrices A˜II and (i) A˜IB be scaled similar to AII and AIB . . −1 -. .3 Preconditioning S or A Given subdomains Ω1 . . 1) . where the action of the inverse of the preconditioner A˜ for A has the following block matrix structure: . . 3.5) may in principle be sought in two alternate ways. the Schur comple- ment system may be solved for uB using Alg. . . T respectively. - ˜−1 I −A˜−1 II AIB A˜II 0 I 0 A = 0 I 0 S˜−1 − ATIB A˜−1 II I . respectively.. (3. . If a pre- (i) conditioner A˜ is employed for A. or the convergence rate can deteriorate significantly to O(h−2 ) even if cond(A˜II . In the second approach. . Ωp of Ω. 3. Remark 3. 3. The advantage is that an exact solver is not required II (l) for AII . each iterate should be normalized to have zero mean value.. if wB ∈ IRnB de- notes the output of the preconditioned system in the k’th iterate. but the disadvantage is that the inexact solver must be applied twice. for 1 = (1. The second approach has not be studied extensively. and (l) (l) (l) so require solving systems of the form AII wI = rI exactly (to machine (l) precision) each iteration. In the first approach. A sparse direct solver may be used for AII . As a result. 1T 1 Such normalizations will not be needed when c(x) = 0.8 Implementational Issues 187 will be singular.

Remark 3. such bounds can deteriorate to O (h0 /h)(1 + log(h0 /h))2 when a traditional coarse based on a coarse triangulation is employed. SA11. 3. see Chap. provided the coefficient a(. Other coarse spaces include wirebasket and partition of unity spaces.) is constant within each subdomain. see [BR15. SA8. see [CO8. if Γ denotes the curve or surface along which the coefficient a(.) has large jump discontinuities. on a two dimensional domain. WI6.97). MA12.P is defined as follows. Then. see [NE5]. Parallelization and Libraries Typically. paralleliza- tion. Another consideration is the choice of a coarse space. For instance. If an initial decomposition of Ω yields subdomains on which a(.) is discontinuous. Let Ni denote a matrix of size nB (i) whose columns form a basis for the null space of the local Schur complement matrix S (i) when c(x) = 0. In the Schur complement method.4 for additional comments on local solvers. V0. but improve 2 to O (1 + log(h0 /h)) when a piecewise constant coarse space is employed (see Remark 3. Importantly. sparse direct solvers are employed for solving the subdomain prob- lems arising in a Schur complement algorithm. 1) .e. Choosing subdomains with reduced variation in a(. In some applications. the subdomains must align with the discontinuities of a(. which require synchronization between the processors assigned to different subdomains.8. . then larger subdomains may be further decomposed to improve load balancing. SA7. 2. Such a coarse space will be defined even when the subdomains do not form a coarse triangulation of Ω. and the MPI and PETSc libraries. or the rate of con- vergence of a Schur preconditioned algorithm can deteriorate. Ideally. MA15]. are better when a coarse space is included. For 2nd order scalar T elliptic equations Ni = (1. i. For a Schur complement preconditioner with optimal order complexity. FFT based solvers and iterative solvers are used for subdomain problems. Theoretical bounds for Schur complement preconditioners.8. For a three dimensional domain.5 Remarks on Discontinuous Coefficient Problems When a(. . MA14. SA12].70 below). the action of A−1 II and the action of pre- conditioners typically involve parallel tasks. respectively. then Γ ⊂ B = ∪pi=1 ∂Ωi .P ≡ Range(RT0 ) where:   T T RT0 = RTB (1) D(1) N1 · · · RTB (p) D(p) Np .188 3 Schur Complement and Iterative Substructuring Algorithms 3. . care must be exercised in the choice of a subdomain decomposition and a coarse problem.70. .) is smooth. typical bounds are 2 O (1 + log(h0 /h)) when a traditional coarse space is employed. The “piecewise constant coarse space” V0. . however.).4 Local Solvers. Let D(i) be a diagonal matrix of size nB (i) with nonnegative entries defined by (3. Let nB and nB (i) denote the number of nodes on B and B (i) . the PETSc library contains parallel codes implementing most Schur complement algorithms.) also yields better conditioned local problems. DR10..

The limiting cases above indicate that the square root of the discrete Laplace-Beltrami operator on the interface B will generally not be an effective preconditioner for S in the strongly anisotropic case. If α1 = 1 and α2 → 0+ . on ∂Ω. This problem will be strongly anisotropic when (α1 /α2 ) 1 or (α1 /α2 )  1. yielding an ill-conditioned matrix.111) u = 0. Formally.8. AIB will be proportional to α1 . This suggests ABB as a heuristic preconditioner for S. The diagonal blocks of S approach a discretization of −(∂ 2 /∂x22 ).6 Remarks on Anisotropic Problems To motivate Schur complement algorithms for anisotropic problems. and its eigendecomposition may be obtained exactly. Then. The following special limiting cases may be noted. each diagonal block of S will formally approach a scalar multiple of the identity (and will be well conditioned). in Ω (3. In particular. 3. with increasing indices as x2 increases and as x1 increases.111) may be of singular perturbation type with boundary layers in the solution [KE5. Suppose that the unknowns are ordered consecutively along each vertical line x1 = c. and the off diagonal blocks of S will formally approach zero. and instead heuristically motivate issues for consideration when formulating a Schur complement preconditioner. L2 ) × (0.). As a result. elliptic equation (3.8 Implementational Issues 189 3. posed on a domain Ω ≡ (−L1 . The traditional norm T equivalence between the subdomain Schur complement energy u(i) S (i) u(i) and the fractional Sobolev energy |uh |21/2. the coefficient matrix A will have a block tridiagonal structure. When this holds. it will result in deteriorated convergence rates. yielding that S = (ABB − ATIB A−1 II AIB ) → ABB as α1 → 0+ . When α1 = 1 and α2 → 0+ . as in Chap. then the linear system will be strongly coupled along the x2 -axis. with the ratio (c2 /c1 ) increasing in proportion to the anisotropy in a(. However. If α2 = 1 and α1 → 0+ . When α1 → 0+ and α2 = 1. and AII will remain nonsingular as α1 → 0+ . consider the following model equation: −α1 ux1 x1 − α2 ux2 x2 = f. if the off diagonal blocks in S are neglected when a preconditioner is formulated.3.∂Ωi on the subdomain boundary: T c1 |uh |21/2. then the linear system will be strongly coupled along the x1 -axis. but weakly coupled along the x2 axis.∂Ωi . LA5]. 3. but S will still have a block tridiagonal structure in this limiting case. Consider a discretization of the above equation on a uniform grid and suppose that Ω is partitioned into vertical strip subdomains. will deteriorate for an anisotropic problem. . but weakly coupled along the x1 axis.∂Ωi ≤ u(i) S (i) u(i) ≤ c2 |uh |21/2. we shall assume that the boundary layer need not be captured. 1) ⊂ IR2 with parameters α1 > 0 and α2 > 0 which determine the degree of anisotropy in the equation.

Then.7 Remarks on Time Stepped Problems In time stepped problems. . T ) ⎪ ⎩ u(x. then a coarse space will be required. while for the latter. if the strips were chosen with its sides perpendicular to an axis of strong coupling. then the elliptic equation will be strongly coupled on planes perpendicular to the eigen- vector of a(. if a(.). uTB )T and ˜ f = (˜f TI .8.) is a constant matrix having only one relatively small eigenvalue. if a preconditioner is employed. Depending on the alignment of the sides of the subdomains relative to direction of weak coupling. When a(. provided its sides are perpendicular to the eigenvector associated with the smallest eigenvalue of a(. based on the subdomain . where Lu ≡ −∇ · (a∇u). in Ω. . as when α2 → 0+ . for the former. When a(. then the elliptic equation will be strongly coupled along rays (lines) parallel to the eigenvector associated with the largest eigenvalue. .) is a constant (or mildly varying) but strongly anisotropic matrix function on a domain Ω (not necessarily rectangular). each subdomain problem may have similar anisotropic limits. However. strip subdomains may be chosen so that the equation is strongly coupled within each strip. Then. or one based on al- gebraic approximation may be employed. S will formally approach ABB in the limit. Matrix ABB may then be employed as a heuristic algebraic preconditioner for S (without coarse space correction). at each time step.) corresponding to the smallest eigenvalue. For instance. Heuristically. an algebraic approximation may be constructed to have the same anisotropic limits. T ) u = 0. In three dimensions. care must be exercised. on ∂Ω × (0. and by analogy with the model problem as α1 → 0+ . a coarse space may be required. Given a nonoverlapping decomposition Ω1 . ˜ f TB )T . 0) = u0 (x). . where 0 < τ denotes the time step and (I + τ A) corresponds to a finite difference discretization of the elliptic operator (I + τ L). 3. We consider an implicit scheme in time and a finite difference discretization in space for the parabolic equation: ⎧ ⎪ ⎨ ut + L u = f. block partition u = (uTI . strip subdomains may still be em- ployed. Heuristically.) is a constant matrix having two relatively small eigenvalues. in Ω × (0. This will yield a linear system (I + τ A) u = ˜ f .190 3 Schur Complement and Iterative Substructuring Algorithms However. the Schur complement matrix will have a block tridiagonal structure. a preconditioner based on subdomain Schur complements (such as the Neumann-Neumann or balancing preconditioner). Ωp of Ω with interface B. the condition number of an unpreconditioned Schur complement matrix improves with decreasing time step. However. the coefficient matrix a(x) will have three eigenvalues for each x ∈ Ω and the elliptic equation will be strongly anisotropic if either one or two eigenvalues of a(x) are very small relative to the others. with sides perpendicular to the direction in which the equation is weakly coupled.

(3. The time stepped system (I + τ A) u = ˜ f.112) τ AIB I + τ ABB uB ˜ fB The Schur complement system will be: f B − τ ATIB (I + τ AII )−1˜ S(τ ) uB = (˜ f I ).8 Implementational Issues 191 interiors ∪pi=1 Ωi and interface B. will have the following block structure:      I + τ AII τ AIB uI ˜ fI T = . 3. where the Schur complement matrix S(τ ) satisfies: .

since formally. S(τ ) → I as τ → 0+ . FFT based preconditioners M (τ ) can be constructed to adapt to the τ and h dependent terms. we outline a heuristic approach based on a subassembly identity for the time stepped Schur complement S(τ ) in terms of S (l) (τ ):  p S(τ ) = RTB (l) S (l) (τ )RB (l) . see [DA4. Due to τ and h dependent terms. such as the square root of the discrete Laplace-Beltrami matrix for a two subdomain decompo- sition. DA5. we may split S(τ ) as:  p . it may also be of interest to formulate a stable one iteration algorithm which computes the discrete solution at each time step to within the local truncation error. such as would be heuristically expected for the Neumann-Neumann and balancing preconditioners. Below. Given such a decomposition. AIB and AII grow as O(h−2 ) as h → 0+ . In time stepped problems. for a fixed h. LA3. −1 S(τ ) = I + τ ABB − τ 2 ATIB (I + τ AII ) AIB . The entries of ABB . and I = l=1 RTB (l) I (l) RB (l) forms an algebraic partition of the identity. will not perform uniformly.71. However. by heuristic analogy with Schwarz algorithms. ZH5]. Remark 3. For such preconditioners. In the strip or two subdomain case. DR5. a preconditioner M (τ ) must ideally adapt to both parameters uniformly. l=1 (l)T ABB ) − τ 2 AIB (I + τ AII )−1 AIB is a subdomain (l) (l) (l) where each S (l) (τ ) = (I (l) + τ p Schur complement. using a fixed FFT based based preconditioner M for S(τ ). we expect that a coarse space may not be required if some time step constraint of the form τ ≤ c h20 holds. LA4.

LA3. VA. (l) (l) (l) S(τ ) = I + τ l=1 and apply a generalized ADI (alternating directions implicit) method to con- struct an approximate solution [DR5. . VA2]. 9. Time step constraints may apply. For an alternative scheme. (l)T RTB (l) ABB − τ AIB (I + τ AII )−1 AIB RB (l) . see [ZH5]. LA4. see Chap.

and in some cases on the jumps in the coefficients a(·). DR10. Our discussion will be organized as follows. In most applications. 3. We omit theoretical discussion of wirebasket preconditioners. In Chap.5. DR2.6. 3.9. In Chap. we describe theoretical methods for estimating the condition number of selected Schur complement preconditioners.1. and employ theoretical properties of elliptic equations and Sobolev norms to estimate the dependence of partition parameters on the mesh size h. we estimate the condition number of several two sub- domain preconditioners.1) on a quasiuniform triangulation Th (Ω) of Ω. We focus primarily on the dependence of the condition numbers on the mesh parameter h and subdomain size h0 . we describe estimates for the condition number of the balancing domain decom- position preconditioner. subdomain size h0 and jumps in the coefficient a(.). Chap. MA17]: . LI4. 3.9. The coefficient c(.9. we will employ the abstract Schwarz convergence theory described in Chap. . for 1 ≤ i ≤ p c(x) = 0.1 Background Results We will consider a finite element discretization of elliptic equation (3. .3. we describe theoretical properties of the traditional and piecewise constant coarse spaces. Poincar´e-Freidrich’s inequalities. GR8. we assume that the finite element space consists of continuous piecewise linear finite elements.1) will be assumed to be zero. . 2. 3.) will be assumed to be constant on each subdomain: a(x) = ρi .2 describes discrete Sobolev inequalities for finite element spaces and uses them to prove a result referred to as the glob theorem (our proof will hold only in two dimensions). we introduce scaled Sobolev norms. useful in estimating partition parameters for glob based algorithms. We use these background results to derive an equivalence be- tween the energy associated with the Schur complement matrix and a scaled sum of fractional Sobolev norm energies.5. . Ωp which forms a quasiuniform triangulation Th0 (Ω) of Ω of diameter h0 . We will denote the finite element space defined on Ω as Vh (Ω) and by Vh (D) the space of finite element functions restricted to D.9.) in (3. in Ω. BR15. BPS and vertex space preconditioners. 3. 3. In Chap. To obtain such bounds. 3. we estimate the condition number of multisubdomain block Jacobi.9. The domain Ω will be assumed to be partitioned into nonoverlapping subdomains Ω1 . for x ∈ Ωi .9. and trace and exten- sion theorems.192 3 Schur Complement and Iterative Substructuring Algorithms 3.4. for any subregion D ⊂ Ω (including D ⊂ B). In Chap. while coefficient a(. In Chap. The following scaled norms and seminorms will be employed throughout this section [NE.9 Theoretical Results In this section.9. DR14.

9 Theoretical Results 193 ⎧ 2  ⎪ ⎪ |u|1.B (i) ≡ B (i) B (i) |u(x)−u(y)| |x−y|d dxdy + h0 B (i) |u| dx. 1 2 By construction. As a consequence. the above norms and seminorms will scale similarly under dilations of the underlying domain in IRd . H00 (Di ) Definition 3.∂Ωi will be stronger than v1/2. extension.Di + .∂Ωi when the func- tion v(·) ∈ H 1/2 (∂Ωi ) is zero outside some subregion Di ⊂ ∂Ωi .Ωi ≡ |∇u|2 dx ⎪ ⎪  Ωi  ⎪ ⎪ ⎨ u21. h0 When Ωi ⊂ IR2 . Ωi ⊂ IRd ⎪ ⎪ ⎪ ⎩ u2   2  1/2. We will frequently encounter norms of the form v21/2. Importantly. the norm v1/2. We define an extension by zero map E0 as: v on Di E0 v = 0 in ∂Ωi \Di .114) ⎩ v 1/2 ≡ E0 v1/2. H00 (Di ) Substitution of the above definition into the integral form of the fractional Sobolev norm on H 1/2 (∂Ωi ) yields:     |v(x) − v(y)|2 |v(x)|2 v2H 1/2 (D ) ≡ dxdy + 2 dydx 00 i Di Di |x − y|d Di ∂Ωi \Di |x − y| d u20. ∂Ωi \Di ) denotes the distance of x to ∂Ωi \Di . 1/2 the fractional Sobolev space H00 (Di ) may also be defined equivalently as . and map the results back to the original domain and obtain estimates of the norms in the trace and extension theorems independent of the width h0 of the subdomain.72.Di and will be de- noted v2 1/2 as formalized below. Let Di ⊂ ∂Ωi .113) ⎪ ≡ B (i) B (i) |u(x)−u(y)| 2 ⎪ ⎪ |u|1/2. 3. this is easily verified to be equivalent to:    |v(x) − v(y)|2 1 |u(x)|2 v2H 1/2 (D ) ≡ dxdy + dx. we may map a subdomain Ωi of width h0 to a reference domain of width 1.∂Ωi .B (i) |x−y|d dxdy. ∂Ωi \Di ) Here dist(x. and Poincar´e-Freidrich’s type inequalities are independent of h0 when scaled norms are employed. apply trace or extension theorems on the reference domain.Ωi ≡ |∇u|2 dx + h12 Ωi |u|2 dx Ωi 0   2 (3. 1/2 and define a Sobolev space H00 (Di ) and its norm by: ⎧   ⎨ H00 1/2 (Di ) ≡ v ∈ H 1/2 (Di ) : E0 v ∈ H 1/2 (∂Ωi ) (3. We will thus assume heuristically that the bounds in the trace. In such cases. 00 i Di Di |x − y| d h0 Di dist(x.

Lemma 3. Proof.Di ≤ v2H 1/2 (D ) ≤ C λl Pl v20.114) will satisfy: $ % 1/2 1/2 H0 (Di ) = v ∈ L2 (Di ) : −∆Di v ∈ L2 (Di ) . This equivalence enables an alternate formal representation of fractional Sobolev spaces and their norm using fractional powers of eigenvalues in the spectral expansion of a Laplace-Beltrami operator associated with the underlying spaces. See [LI4. see [LI4. l=1 denote its spectral representation where each Pl denotes an L2 (Di )- orthogonal projection onto the null space of −∆Di associated with eigen- value λl > 0. provided the function either has zero mean value on the underlying domain. Formally define the fractional power (−∆Di ) of operator −∆Di as: ∞  1/2 1/2 (−∆Di ) = λl Pl . while its fractional Sobolev norm will satisfy: ∞  ∞  1/2 1/2 c λl Pl v20.194 3 Schur Complement and Iterative Substructuring Algorithms an interpolation space of the form [L2 (Di ). BA3. ∀u ∈ H01 (Di ) ⊂ L2 (Di ). Suppose the following assumptions hold.   1. BA3. the fractional Sobolev space H00 (Di ) defined by (3. l=1 1/2 Then. BE16]. BE16]. as guaranteed by the Riesz representation theorem. 1/2 2.   We next describe a result referred to as the Poincar´e-Freidrich’s inequality. H01 (Di )]1/2 using interpolation between embedded Hilbert spaces. or is zero on a segment of the boundary. 00 i l=1 l=1 for some 0 < c < C.Di . Let H01 (Di ) = v : E0 v ∈ H 1 (∂Ωi ) and let −∆Di formally denote a self adjoint coercive operator which generates the Dirichlet form: (−∆Di u.73. which establishes a bound for the L2 (Ωi ) norm of a function in terms of one of its Sobolev seminorms. . Let: ∞  −∆Di = λl Pl . as de- scribed below. u)L2 (Di ) ≡ u21.Di .

Ωi ≤ C|v|21. The parameter C will be independent of h0 be- cause of the scaled norms employed. If v ∈ H (Ωi ) satisfies Ωi vdx = 0 then: 1 v20. If v ∈ H 1 (Ωi ) satisfies v = 0 on Di ⊂ ∂Ωi where measure(Di ) > 0. If v ∈ H 1 (Ωi ).Ωi ≤ (1 + C) |v|21.113).B (i) . and states that when Ωi ⊂ IRd for d = 2. then: g20. by the closed graph theorem this mapping will have a bounded right inverse. GR8]. 2.75 (Trace Theorem).Ωi .B (i) .Ωi v21.∂Ωi ≤ Cv1. then: |g20. 3.Ωi .   The linear mapping of v ∈ H 1 (Ωi ) to its boundary value v ∈ H 1/2 (∂Ωi ) is not only bounded.B (i) |g21/2. 3.B (i) ≤ (1 + C) |g|21/2. The following bounds will hold. As a consequence. 4. respectively. The next result we describe is referred to as a trace theorem.Ωi .Ωi v21. If g ∈ H 1/2 (B (i) ) satisfies B (i) gds = 0.B (i) ≤ C |g|21/2. This result is stated below. functions in H 1 (Ω) will have boundary values (or trace) of some regularity (smoothness). See [NE]. Theorem 3. and referred to as an extension theorem. where C > 0 is independent of v and h0 . for some C > 0 independent  of v and h0 .B (i) ≤ (1 + C) |g|21/2. LI4. . see [NE. GR8]. See [NE.  1.74 (Poincar´ e-Freidrich’s). with v ∈ H 1/2 (∂Ωi ) ⊂ L2 (∂Ωi ).B (i) ≤ C |g|21/2.Ωi ≤ C|v|21.Ωi ≤ (1 + C) |v|21. 3. for some C > 0 independent of v and h0 . Proof. If g ∈ H 1/2 (B (i) ) satisfies g = 0 on Di ⊂ ∂Ωi where measure(Di ) > 0. Additionally. it is surjective. and: v1/2. LI4.B (i) g21/2. the first and third Poincar´e-Freidrich’s inequalities may equivalently be stated in the quotient space H 1 (Ωi )/IR or H 1/2 (B (i) )/IR. since the seminorms are invariant under shifts by constants. for some C > 0 independent of g and h0 . then its restriction to the boundary ∂Ωi will be well defined. then: v20.   For the choice of scaled Sobolev norms and seminorms defined in (3. Proof. for some C > 0 independent of g and h0 .9 Theoretical Results 195 Lemma 3. the parameter C will be independent of h0 .

LI4.∂Ωi . Proof. NE6]. The independence of C from h0 is a consequence of the scaled norms employed. Let Ωi be a polygonal domain of size h0 triangulated by a grid Th (Ωi ) quasiuniform of size h. Furthermore. it can easily be shown that Hgh ∈ H 1 (Ωi ). BR11.Ωi ≤ C gh 1/2. such that for gh ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) Eh gh = gh . in Ωi Hgh = gh . More general results are described in [AS4. in which case the solution to Laplace’s equation has sufficiently regular solutions. we will require a discrete version of the preceding extension theorem in which the extended function is a finite element function. where C > 0 is independent of gh . with the following bound holding: Eh gh 1. GR8]. We refer to such a result as a discrete extension theorem. given gh ∈ Vh (∂Ωi ).  As we will be working with finite element functions.∂Ωi for C > 0 independent of g and h0 . Hgh will satisfy the a priori bound: Hgh 1. BJ9. To construct a finite element extension Eh gh ∈ Vh (Ωi ).∂Ωi . Lemma 3.76 (Extension Theorem).Ωi ≤ C g1/2.196 3 Schur Complement and Iterative Substructuring Algorithms Theorem 3. we will first extend gh to the interior of the subdomain as a harmonic function Hgh :  −∆(Hgh ) = 0. Applying the continuous extension theorem and using the weak formulation of Laplace’s equation on Ωi . Proof. We will outline a proof when Ωi ⊂ IR2 . satisfying the following bound: Eg1. WI. on ∂Ωi . h and h0 . See [ST7. Then there exists a bounded linear map: Eh : Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) → Vh (Ωi ) ∩ H 1 (Ωi ). on ∂Ωi .Ωi ≤ Cgh 1/2. There exists a bounded linear map E : H 1/2 (∂Ωi ) → H 1 (Ωi ) such that for each g ∈ H 1/2 (∂Ωi ): Eg=g on ∂Ωi .77.

Ωi ≤ C h |Hgh |1+.∂Ωi .∂Ωi .Ωi ≤ C |gh |1/2. The harmonic extension Hgh will. 3. Substituting an inverse inequality [CI2. EV]: |Hgh |1+. so that using a quotient space and applying Poincar´e- Freidrich’s inequality yields: Ih Hgh 1. the harmonic extension Hgh will be H 1+ (Ωi ) regular on the polygonal domain [GR8] and satisfy the following a priori bound.Ωi + |Hgh |1. JO2] of the form: |gh |1/2+. We now verify that the discrete extension map Eh is bounded.∂Ωi where C is independent of h0 and h. Consequently. GI.Ωi ≤ C h |gh |1/2+. not be a finite element function. see [NE.9 Theoretical Results 197 where C > 0 is independent of gh and h.∂Ωi . Applying standard error bounds [CI2. Since Hgh is a harmonic function.∂Ωi .Ωi ≤ C |gh |1/2+. Since gh is continuous and piecewise polynomial it will hold that gh ∈ H 1 (∂Ωi ). Ih Hgh will be well defined in Vh (Ω). JO2] for the interpolation map yields: |Ih Hgh − Hgh |1. GR8.∂Ωi ≤ C h− |gh |1/2. it will be continuous in the interior and so the interpolant Ih Hgh will be well defined on the interior nodes of Ωi .   We shall next state and prove a basic norm equivalence between the energy associated with the Schur complement matrix on a subdomain and a weighted fractional Sobolev norm energy on the boundary of the subdomain. By construction we obtain Ih Hgh = α when gh (x) = α ∈ IR. Since ρi is not involved in this construction. So define Eh gh as the interpolant Ih Hgh of Hgh onto the finite element space V h (Ωi ): Eh gh ≡ Ih Hgh .∂Ωi . Thus. into the preceding yields: |Ih Hgh − Hgh |1. C will be independent of ρi .Ωi ≤ |Ih Hgh − Hgh |1.Ωi ≤ C h h− |gh |1/2.Ωi ≤ C gh 1/2. . however. Applying the triangle inequality and employing the preceding bounds yields: |Ih Hgh |1. the interpolant Ih Hgh is well defined on the boundary ∂Ωi since Hgh = gh is continuous and piecewise polynomial on ∂Ωi . By construction.

Suppose the following assumptions hold. on ∂Ωi . Let a(·) in (3. given data gi ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) let αi denote the mean value of ui = Hih gi on Ωi . ∀v ∈ Vh (Ωi ) ∩ H01 (Ωi ) ui = gi .∂Ωi = |gi − αi |21/2. v) ≡ ρi ∇u · ∇vdx. ui ) . C2 > 0 denote generic constants independent of h. h0 . To prove the upper bound. 3.∂Ωi ≤ C1 ui − αi 21.∂Ωi is replaced by gi 2 1/2 (i) and provided the appropriate version of the H00 (B ) Poincar´e-Freidrich’s inequality is employed.Ωi = Cρi2 Ai (ui . will be analogous provided the boundary norm gi 21/2.Ωi = C2 |ui |21.198 3 Schur Complement and Iterative Substructuring Algorithms Lemma 3. and that Hih (gi − γi ) = ui − γi . Let ui ∈ Vh (Ωi ) ∩ H 1 (Ωi ) satisfy: Ai (ui . To prove the lower bound. h0 and ρi . represent the extension ui = Hih gi in the form: Hih gi = Ei gi + wi . 2. Apply the invariance of seminorms under shifts by constants and the trace theorem to obtain: |gi |21/2. Here C1 .78. We will employ the notation ui = Hih gi ∈ Vh (Ωi ) ∩ H 1 (Ωi ) to denote a discrete harmonic function with boundary values gi ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ).∂Ωi . Proof. v) = 0. for 0 < c < C independent of h. 1. The proof when ∂Ωi ∩ BD = ∅.∂Ωi ≤ Ai (ui .Ωi ≤ C2 |ui − αi |21. where  Ai (u. Ωi Then the following norm equivalence will hold: c ρi |gi |21/2. . ρi and gi . it will follow that if γi is a constant then Hih γi = γi . Let gi ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) satisfy: gi = 0 on BD ∩ ∂Ωi . where the third line above follows by Poincar´e-Freidrich’s inequality since αi corresponds to the mean value of ui on Ωi .1) satisfy a(x) = ρi on Ωi for 1 ≤ i ≤ p and c(·) ≡ 0 on Ω. Since c(x) = 0. ui ) ≤ C ρi |gi |21/2. We will describe the proof for the case ∂Ωi ∩ BD = ∅.

∂Ωi .115) Eih gi 1. by construction. where Hih (gi − γi ) = ui − γi . .∂Ωi as given by the discrete extension theorem (Lemma 3.77). wi ) = −Ai (Eih gi .115) yields: |ui |1. on ∂Ωi (3. such as its aspect ratio. The parameters Ci in the preceding estimates will be indepen- dent of h and ρi . Ci may depend on other geometrical properties of Ωi .Ωi ≤ |Eih gi |1.Ωi ≤ C2 gi 1/2.Ωi . which is the desired upper bound. 3. they will be independent of h0 due to the scale invariance of the seminorms. In general. The same bound will hold if gi is replaced by gi − αi for any constant αi : |ui − γi |1. If we choose γi as the mean value of gi on ∂Ωi . Applying the triangle inequality to ui = Eih gi + wi . ui ) = ρi |ui |21.Ωi |wi |1. and wi is defined by wi ≡ Hih gi − Eih gi ∈ Vh (Ωi ) ∩ H01 (Ωi ). It will thus hold that: Ai (ui .Ωi ≤ C1 gi 1/2.∂Ωi .∂Ωi ≤ C3 |gi − γi |1/2.Ωi = Ai (wi . then gi − γi will have zero mean value on ∂Ωi . and choose v = wi ∈ Vh (Ωi ) ∩ H01 (Ωi ) to obtain: ρi |wi |21.∂Ωi . v) = 0 ∀v ∈ Vh (Ωi ) ∩ H01 (Ωi ).∂Ωi .79. wi ) ≤ ρi |Eih gi |1.Ωi ≤ C2 gi − γi 1/2. In addition.   Remark 3. It thus follows that: |wi |1.9 Theoretical Results 199 where Eih gi ∈ Vh (Ωi ) ∩ H 1 (Ωi ) is an extension of gi satisfying the following: Eih gi = gi . We substitute the above repre- sentation into the equation satisfied by Hih gi : Ai (Ei gi + wi . and an application of the Poincar´e-Freidrich’s inequality will yield: |ui − γi |1.Ωi = ρi |ui − γi |21.∂Ωi .Ωi ≤ C gi 1/2. and using the preceding bound and equation (3.Ωi ≤ C2 gi − γi 1/2.Ωi ≤ C3 ρi |gi |21/2.∂Ωi = C3 |gi |1/2.

T 2. Given a vector uB of nodal values on interface B.80. . Let Vh (Ωi ) denote a finite element space defined on a domain Ωi ⊂ IR2 of diameter h0 triangulated by a quasiuniform grid of size h. where C > 0 is independent of h0 and h.2 Discrete Sobolev Inequalities We next describe a discrete Sobolev inequality [BR12] which holds for finite element functions on Ω ⊂ IR2 .   Remark 3. uh ) ≤ C ρi |uh |21/2. v) ≡ ρi ∇u · ∇vdx. 1.Ωi ≤ C (1 + log(h0 /h)) h−2 0 uh 0. Ωi Then. . Let the coefficient a(x) = ρi on Ωi and c(x) = 0 on Ω with:  Ai (u.82. i=1 The result now follows by an application of the preceding lemma on each subdomain. Let uh denote the discrete harmonic finite ele- ment function corresponding the nodal vector u. MA17]. Lemma 3. DR10. 3. Let Ω1 . and summing over all subdomains using that gi = uh on ∂Ωi . uTB where uI = −A−1 II AIB uB .Ωi + |uh |1. Ωp form a quasiuniform triangulation of Ω of width h0 . Suppose the following assumptions hold.∂Ωi ≤ uTB SuB = A(uh . .9. see also [DR2. define u = uTI .81.Ωi . Proof.200 3 Schur Complement and Iterative Substructuring Algorithms Applying the preceding equivalence on each subdomain and summing over all the subdomains yields a global equivalence between the Schur complement energy and a weighted sum of the subdomain fractional Sobolev energies. . Then the following bound will hold for the maximum norm on Vh (Ωi ) ⊂ H 1 (Ωi ): uh 2∞. uh ) ≡ T Ai (uh . h0 and ρi . In view of the preceding result.∂Ωi . uh ). 2 2 ∀uh ∈ Vh (Ωi ). 3. . Theorem 3. MA14. i=1 i=1 for 0 < c < C independent of h. the following estimate will hold:  p  p c ρi |uh |21/2. a preconditioner M for S must ideally be chosen so that its interface energy uTB M uB approximates the above weighted sum of fractional Sobolev energies on its subdomain boundaries. it will satisfy:  p uTB SuB = u Au = A(uh . Since uh is piecewise discrete harmonic by assumption.

Let C ⊂ Ω i denote a cone of radius R and angle α at vertex x∗ . Let x∗ ∈ Ω i denote a point where the finite element function uh attains it maximum modulus |uh (x∗ )| = uh ∞. θ) + (r. θ) dr. Apply the fundamental theorem of calculus along a ray within the cone:  R ∂uh uh (0. We follow the proof in [BR12]. 0) = uh (R.9 Theoretical Results 201 Proof. θ) within the cone so that (0.Ωi . Introduce polar coordinates (r. 0 ∂r Split the integral using the intervals 0 ≤ r ≤ . 3. 0) corresponds to x∗ and so that the cone is specified in polar coordinates by 0 ≤ r ≤ R and 0 ≤ θ ≤ α.

h and .

h ≤ r ≤ R for some 0 < .

and employ the inverse inequality −1 dr ∞.Ωi ≤ uh ∞.Ωi h  du h within the interval 0 ≤ r ≤ . take absolute values of all terms. 1.

θ) | ∂uh∂r dr| + h (r. θ)| + . θ)| + 0 (r. h (which holds trivially for piecewise linear finite elements) to obtain:  h R |uh (0.θ) | ∂uh∂r dr| R ≤ |uh (R. 0)| ≤ |uh (R.

θ) = |uh (R.θ) dr|  R ∂uh (r. h uh ∞. θ)| + .Ωi h−1 +  h | ∂uh∂r (r.

bringing back the term .Ωi +  h | ∂r dr| Since |uh (0. uh ∞. 0)| = uh ∞.Ωi .

Ωi yields: R (1 − . uh ∞.

Ωi ≤ |uh (R. α) yields: α αR (1 − .) uh ∞.θ) | ∂uh∂r dr|. θ)| + h (r. Integrating the above expression as θ ranges in (0.

Squaring both sides. applying the triangle inequality and the Cauchy-Schwartz inequality to the terms on the right side yields: α α uh 2∞.) αuh ∞.Ωi ≤ 0 |uh (R. θ)|2 dθ)( 0 dθ) . θ)| dθ + 0 h (r.θ) = 0 |uh (R.θ) | ∂uh∂r |dr dθ α αR 1 ∂uh (r. θ)| dθ + 0 h r | ∂r |r dr dθ.Ωi ≤ α2 (1−) 2 2 ( 0 |uh (R.

  αR α R 2 + α2 (1−) 2 (r. Simplifying the expression yields the bound: .θ) 2 ( 0  h | ∂uh∂r | r dr dθ)( 0  h r12 r dr dθ) .

Ωi ≤ α2 (1−) 2α 2 0 (r. θ)|2 dθ + log(R/.θ) 2 |uh (R. αR α uh 2∞.

h) 0 h | ∂uh∂r | r dr dθ .Ωi ≤ α(1−) 2 2 0 |uh (R. we obtain: Since ( ∂u h 2 ∂uh 2 1 ∂uh 2 2  α uh 2∞. ∂r ) ≤ ( ∂r ) + r 2 ( ∂θ ) = |∇uh | . θ)|2 dθ + (log(1/.

Ωi .) + log(R/h) |uh |21. . θ)|2 dθ + C(1 + log(h0 /h) |uh |21.C  α ≤ α(1−) 2 2 0 |uh (R.

202 3 Schur Complement and Iterative Substructuring Algorithms Multiplying both sides by R dR and integrating over 0 ≤ R ≤ β h0 (assuming that the cone C can be extended within Ωi to have diameter β h0 . for some 0 β ≤ 1) yields the estimate: β 2 h20 2 uh.

Ωi 2  βh0  α Cβ 2 h20 ≤ 2 α(1−)2 0 0 |uh (R.∞. θ)|2 R dR dθ + 2 + log(h0 /h)) |uh |21.Ωi (1 .

Ωi + 2 0 (1 + log(h0 /h)) |uh |21.Ωi .   As a corollary of the preceding result. the following bound will hold: . Then. Cβ 2 h2 ≤ 2 α(1−)2 uh 20. Lemma 3.83. Dividing both sides by the factor β 2 h20 /2 yields the desired result. we obtain a discrete Sobolev in- equality holding on the boundary ∂Ωi of a two dimensional domain. Let Ωi ⊂ IR2 be of diameter h0 and triangulated by a quasiu- niform grid of size h.

  We shall now present an alternate proof of the preceding discrete Sobolev inequality based on Fourier series [BR29]. k=−∞ . for C > 0 independent of h0 and h. Given vh ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) let Hih vh ∈ Vh (Ωi ) ∩ H 1 (Ωi ) denote the discrete harmonic extension of vh into Ωi .∂Ωi ≤ C (1 + log(h0 /h)) |vh |21/2..Ωi ≤ Cvh 21/2.Ωi ⎪ ⎩ ≤ C (1 + log(h0 /h)) vh 21/2. under a Lip- schitz continuous parameterization).∂Ωi . where C > 0 is independent of h0 and h.∂Ωi + h0 vh 20.∂Ωi (3.∂Ωi ≤ Hi vh ∞.Ωi 2 h 2 ⎪ ≤ C (1 + log(h0 /h)) Hih vh 21.∂Ωi . Proof. This proof will use the property that the boundary ∂Ωi of a simply connected polygonal domain Ωi ⊂ IR2 will be Lipschitz homeomorphic to the unit circle S 1 (i. satisfying: Hih vh 21. Given such a parameterization x(θ) of the boundary ∂Ωi by a 2π periodic function x(θ) with arclength measure ds(x(θ)) = |x (θ)| dθ defined along the curve. for C > 0 independent of h0 and h. we may represent any function u(·) ∈ L2 (∂Ωi ) by a Fourier series expansion of the form: ∞  u(x(θ)) = ck eikθ . vh 2∞. there will be a one to one correspondence between ∂Ωi and the unit circle S 1 .e.116) for vh ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ). Applying the preceding lemma to Hih vh and using the boundedness of Hih yields: ⎧ ⎨ vh ∞.

the following bound will hold:   v2L∞ (0. 2β H β (∂Ωi ) = 2π where h0 = |∂Ωi | denotes the length of ∂Ωi . The alternate proof of the discrete Sobolev inequality will be obtained based on the following continuous Sobolev inequality for 2π periodic functions.2π) + . 2π] with Fourier expansion: ∞  v(x) = ck eikx .84. 3. for 0 < β < 1. k=−∞ Then. 1+ Lemma 3.2π) ≤ C v2L2 (0. 2 ⎩ |u|2 h01−2β ∞ k=−∞ 2π |k| |ck |2 . 2π) denote a real periodic function on [0. Let v(x) ∈ H 2 (0. the following equivalences will hold [BR29]: ⎧ h 0 ∞ ⎨ u2L2 (∂Ωi ) = 2π k=−∞ 2π|ck | .9 Theoretical Results 203 When Ωi is shape regular.

(3.−1 v2 1+ .2π) for 0 < .117) H 2 (0.

< 1 and C independent of .

. Proof. To prove the bound take absolute values of the Fourier expansion and apply the Cauchy-Schwartz inequality to obtain: .

∞ 2 v2L∞ (0.2π) ≤ |c0 | + k=−∞. k=0 |ck | .

k=0 (|k| 2 |ck |) |k|− 2 . ∞ 1+ 1+ = |c0 | + k=−∞.

 .

k=0 |k| . ∞ ∞ −1− ≤ 2π|c0 |2 + 2π k=−∞. k=0 |k| 1+ |ck |2 k=−∞.

2π) Using the integral test.2π) + |v|2 1+ k=−∞. ∞ −1− ≤ v2L2 (0. k=0 |k| . H 2 (0. we may bound: ∞    ∞    dx 1 |k|−1− ≤ 2 1 + =2 1+ ≤ 4.

−1 . 1 x1+ .

k=−∞. k=0 for 0 < .

2π) + 4 .2π) ≤ v2L2 (0. Substituting this into the preceding bound yields: v2L∞ (0. < 1.

−1 |v|2 1+ H 2 (0.117) by choosing .116) can now be obtained from (3.2π) which is the desired result.   The discrete Sobolev inequality (3.

appropriately and using an inverse inequality for finite elements. .

Let vh ∈ Vh (∂Ωi ) ∩ H 2 (∂Ωi ) be 2π-periodic.85. Proof. and employ norm equiva- lences to obtain the bound: vh 2∞.∂Ωi ≤ C h−1 0 vh 0. Then.∂Ωi + C . Apply the preceding continuous Sobolev inequality to the 2π-periodic representation of vh .204 3 Schur Complement and Iterative Substructuring Algorithms 1+ Lemma 3. We follow the proof in [BR29]. the fol- lowing bound will hold: vh 2L∞ (∂Ωi ) ≤ C (1 + log(h0 /h)) vh 2H 1/2 (∂Ωi ) for C > 0 independent of h and h0 .

2 −1  h0 |vh |2 1+ . ∀vh ∈ Vh (∂Ωi ) H 2 (∂Ωi ) H 2 (∂Ωi ) in the preceding bound. to obtain: vh 2∞ ≤ C h−1 0 vh 0.∂Ωi + C . with C > 0 independent of h. H 2 (∂Ωi ) Substitute the following inverse inequality: |vh |2 1+ ≤ C h− |vh |2 1 .

H 2 (∂Ωi ) Importantly. 2 −1  − h0 h |vh |2 1 . the parameter .

> 0 may be chosen small enough so that .

This will hold provided: 1 4.−1 (h0 /h) ≤ (1 + log(1/h)). if (h/h0 ) ≥ e−4 .

if (h/h0 ) < e−4 .≡ −1 log(h0 /h) . and can be verified by an application of the derivative test for a maximum in the parameter .

86. Then.∂Ωi = |wh |2H 1/2 (D ) ≤ C (1 + log(h0 /h)) wh 2∞. Let Di ⊂ ∂Ωi denote a connected subset of length di ≤ h0 . Lemma 3. 2.   We now apply the discrete Sobolev inequalities to derive results useful for estimating the condition number of Schur complement preconditioners. Suppose the following assumptions hold. 1.118) for C > 0 independent of h0 and h. Let wh ∈ Vh (∂Ωi ) satisfy: wh (x) = 0 for x ∈ ∂Ωi \Di . the following results will hold: |wh |21/2.Di + |wh |21/2. .. The desired result follows immediately.Di 00 i (3.

Substituting these yields:  di /2 |wh (x(s))|2 h 2  d /2 2 0 s ds = 0 |wh (x(s))| s ds + h i |wh (x(s))| s ds  h 2  d /2 ≤ wh 2∞.Di (1 + log(h0 /h)) . for nodes x ∈ G In particular. Definition 3. h] and [h. the following decomposition of the identity will hold:  I= IG . we may bound |wh (x(s))| ≤ wh ∞. ∂Ωi \Di ) where ds(x) denotes the arclength measure along ∂Ωi for s ∈ (0. G∈G . For 0 ≤ s ≤ h. 3. di ) and dist(x. If G ⊂ B is a glob. di /2 di − s Combining bounds and substituting them into (3. We may similarly bound:  di |wh (x(s))|2 ds ≤ C wh 2∞. Let G denote all globs of B. define the map IG : Vh (B) → Vh (B) which assigns zero nodal values at all nodes in B outside G: vh (x).87. 0.Di 2h2 + wh 2∞.120) Since wh (x) is zero when s = 0 and linear for 0 ≤ s ≤ h.Di + 2 2 2 ds(x). Since the arclength distance satisfies: dist(x.Di (s/h) since wh (x(s)) is linear on the interval and wh (x(0)) = 0.∂Ωi ≤ |wh |1/2. ∂Ωi \Di ) denotes the arclength distance between x and ∂Ωi \Di .Di 0 hs2 s ds + wh 2∞. we may bound |wh (x(s))| ≤ wh ∞. for nodes x ∈ G IG vh (x) ≡ for vh ∈ Vh (B). (3.Di .   We now describe estimates of finite element decompositions based on globs.120) yields the result. di − s}. ∂Ωi \Di ) 0 s di /2 di − s (3. Since wh (x) is zero outside Di . Di dist(x. di /2].119) Di dist(x. ∂Ωi \Di ) = min{s. and a face.Di h i 1s ds h2 = wh 2∞.Di (1 + log(h0 /h)) .Di log(di /2h) ≤ C wh 2∞. Recall that a glob is either an edge or a vertex of B when Ω ⊂ IR2 .9 Theoretical Results 205 Proof. we may employ the equivalent integral expression for the fractional Sobolev seminorm:  |wh (x)|2 |wh |1/2. For h ≤ s ≤ di /2. The first integral may further be split over the intervals [0. the above integral can be split as:   di /2  di |wh (x)|2 |wh (x(s))|2 |wh (x(s))|2 ds(x) = ds + ds. an edge or a vertex of B when Ω ⊂ IR3 .

the following bound will hold: 2 |IG wh |21/2. The following preliminary result establishes a bound for IG when G is a vertex glob in two dimensions.86 to ψG (x) which has support on BG to obtain: |ψG |1/2. and let ψG h (x) ∈ Vh (B) denote a finite element nodal basis function centered at vertex G on B: h 1. (3. It will be useful for estimating par- tition parameters in abstract Schwarz algorithms.∂Ωi |ψG h |1/2.88. Definition 3.123) . the parameters K and L will be bounded independent of h.122) i For typical subdomain decompositions arising from a coarse triangulation. h Proof. Let BG ⊂ ∂Ωi denote the union of elements adjacent to G on which ψG h (x) h has support. Apply Lemma 3. if xj = G ψG (xj ) = 0. Given wh ∈ Vh (B) let IG wh ∈ Vh (B) denote the finite element function: IG wh (x) ≡ wh (G) ψG h (x). by linearity we obtain: |IG wh |1/2. Let L > 0 denote the maximum number of globs on any shared interface of the form ∂Ωi ∩ Ωj : L ≡ max |{G : G ⊂ ∂Ωi ∩ ∂Ωj }| . Then. if xj = G. See [MA17].121) i. h 2 (3.∂Ωi ≤ C (1 + log(h0 /h)) wh 21/2.∂Ωi ≤ C (1 + log(h0 /h)) ψG h 2 ∞.∂Ωi ≤ wh ∞.j Let K > 0 denote the maximum number of neighboring subdomains: K ≡ max |{j : ∂Ωi ∩ ∂Ωj = ∅}| . Let G ∈ ∂Ωi denote a vertex glob. Since IG wh (x) = wh (G)ψG (x).∂Ωi ∀wh ∈ Vh (B) for some C > 0 independent of h0 and h. We shall now outline an important theoretical result referred to as a glob theorem. (3.∂Ωi .BG + |ψG h 2 |1/2.89. h0 and ρi . The glob theorem provides a bound for the H 1/2 (∂Ωi ) seminorm of the finite element interpolation map IG . 1.BG . where each xj denotes a node in B. 2.∂Ωi = |wh (G)| |ψG h |1/2. Let Ω ⊂ IR2 and suppose the following assumptions hold.206 3 Schur Complement and Iterative Substructuring Algorithms We associate the following parameters with a subdomain decomposition. Lemma 3.

123) yields: .BG = h 2 ds(x) ds(y) ≤ 4.9 Theoretical Results 207 for C > 0 independent of h0 and h.BG by substituting h 2 |x−y| that |ψG h (x) − ψG G (y)| ≤ h for x. −h −h |x − y|2 Substituting the preceding bound and using ψG h ∞. y ∈ BG to obtain:   h h |ψG h (x) − ψGh (y)|2 |ψG |1/2.BG = 1 in (3. 3. We estimate |ψG |1/2.

Let G ∈ ∂Ωi denote a edge glob. the following bound will hold: 2 |IG vh |21/2.∂Ωi (C (1 + log(h0 /h)) + 4) ≤ C (1 + log(h0 /h)) wh 2∞. 2. where we employed the discrete Sobolev inequality in the last step. if xj ∈ ∂Ωi \G. Lemma 3.∂Ωi ≤ C vh 0.89 yields an estimate of the form: 2 IG wh 21/2. By construction. if xj ∈ G IG vh (xj ) ≡ 0. Given an edge glob G ⊂ ∂Ωi . |IG wh |21/2. Proof.∂Ωi . let GL .90. See [MA17].∂Ωi will hold triv- ially since the mass matrix on B is spectrally equivalent to an identity matrix.∂Ωi .∂Ωi C (1 + log(h0 /h)) ψG ∞. and given vh ∈ Vh (B) let IG vh ∈ Vh (B) denote the finite element function defined by: vh (xj ).∂Ωi ≤ wh 2∞. Then. Let Ω ⊂ IR2 and suppose the following assumptions hold.∂Ωi .∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2. 1.∂Ωi ≤ C (1 + log(h0 /h)) (1 + log(h0 /h)) wh 21/2.∂Ωi ≤ C (1 + log(h0 /h)) wh 21/2. Let BG denote the union of all elements of ∂Ωi intersecting the glob G.91. GR ∈ ∂Ωi denote its endpoints. for some C > 0 independent of h0 and h. The next result bounds the H 1/2 (∂Ωi ) seminorm of IG when G corresponds to an edge glob in ∂Ωi ⊂ B on a two dimensional domain Ωi . corresponding to vertex globs.BG + |ψG h 2 |1/2. A bound of the form IG vh 0.BG h 2 ≤ wh 2∞.   Remark 3. Combining such a bound with Lemma 3. ∀wh ∈ Vh (B) for C > 0 independent of h0 and h. the finite element function wh (x) will be zero at these endpoints GL and GR and outside the . where xj denotes nodes on ∂Ωi .

BG as follows: ⎧ .∂Ωi ≤ C (1 + log(h0 /h)) vh 2∞.BG + |wh |21/2. Substituting that wh ∞. Since wh (x) = vh (x) − IGL vh (x) − IGR vh (x) on BG . we may apply the gener- alized triangle inequality to estimate the seminorm |wh |21/2.∂Ωi ≤ C (1 + log(h0 /h)) wh 2∞. Applying bound (3.BG + |wh |21/2.208 3 Schur Complement and Iterative Substructuring Algorithms glob.BG 2 ≤ C (1 + log(h0 /h)) vh 21/2. and estimating the latter term by the discrete Sobolev inequality yields: |wh |21/2.BG = vh ∞. and may alternatively be expressed as: vh (x) − IGL vh (x) − IGR vh (x) for x ∈ BG wh (x) ≡ IG vh (x) = 0 for x ∈ ∂Ωi \BG .118) to wh (x) on BG yields: |wh |21/2.∂Ωi + |wh |21/2.BG for C > 0 independent of h0 and h.BG .BG (which holds by construction).

B + |IGR vh |21/2.B .B ≤ 3 |vh |21/2. ⎨ |wh |21/2. .B + |IGL vh |21/2.

∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2.∂Ωi .93. where the latter expression was obtained using |IGL vh |1/2. . Combining the above estimate with the trivial bound |vh |21/2.∂Ωi . ∀vh ∈ Vh (B). the following bound will hold for vh ∈ Vh (B): 2 IG vh 21/2.BG ≤ |IGL vh |1/2. 1.92. Let Vh (Ω) be a finite element space on a quasiuniform triangulation.∂Ωi . Combining such a bound with the preceding lemma will yield an estimate of the form: 2 IG wh 21/2. Let Ω ⊂ IR2 and suppose the following assumptions hold.∂Ωi and employing bounds for the vertex glob interpolants.∂Ωi .BG ≤ |vh |21/2. we obtain: 2 |wh |21/2.∂Ωi ≤ C (1 + log(h0 /h)) wh 21/2.BG + (1 + log(h0 /h)) |vh |21/2.∂Ωi will also hold trivially for edge globs since the mass matrix on B is spectrally equivalent to an identity matrix. ∀wh ∈ Vh (B) for C > 0 independent of h0 and h. 2.∂Ωi which is the desired estimate. a bound IG vh 0. G G G G ⎩ 2 ≤ C |vh |21/2.∂Ωi ≤ C vh 0. Lemma 3.BG . Combining the preceding results yields the two dimensional glob theorem.   Remark 3. Similarly for the term |IGR vh |1/2. As for vertex globs. Let IG denote the glob interpolation map for vertex or edge globs G ⊂ ∂Ωi .∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2. Then.

95. There exists C > 0 independent of h. When the subdomains Ω1 . DR10. 1. 3. h0 and ρi such that:  2 IG vh 21/2. 2. the traditional coarse space defined based on an underlying coarse triangulation of the domain. . . Theorem 3.3 Properties of Coarse Spaces We shall now summarize theoretical properties of two types of coarse spaces employed in Schur complement algorithms. j:∂Ωj ∩G=∅ Proof. .∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2. . π0 or πh0 .   We now state the general glob theorem [MA17] in two or three dimensions. MA17]. . . . the following results will hold. the traditional coarse space V0. See [BR12.   3. 2. Then.∂Ωi in the preceding lemmas. 3. which will hold for some C > 0 independent of h0 and h for any glob G because of the spectral equivalence between the mass matrix and a scaled identity matrix on ∂Ωi . . ψnh00 (x) which satisfy ψih0 (yj ) = δij (where δij is the Kronecker delta). Ωp of size h0 form a coarse triangulation Th0 (Ω) of Ω.∂Ωi ≤ Cvh 20. . DR17. . Let Th (Ω) be a quasiuniform triangulation of Ω ⊂ IRd for d = 2. 3. with estimates for IG in the L2 (∂Ωi ) norm: IG vh 20. . SM2]. . Suppose the following assumptions hold. BR15. . see [BR15. Let Ω1 .T vh (x) = vh (yi ) ψih0 (x). .94 (Glob Theorem). If y1 . Definition 3.T (B) ⊂ Vh (B) cor- responds to the restriction to B of the finite element space defined on the coarse triangulation. then the coarse space interpolation map I0. 1. i=1 The traditional interpolation map I0.B ≤ C (1 + log(h0 /h)) uh 21/2.T : Vh (B) → V0. based on a decomposition of the interface into globs. I0 . Let G ⊂ ∂Ωi be a glob within B and let vh ∈ Vh (B).9. h0 and ρi such that: 2 IG vh 21/2. yn0 denote the coarse vertices with associated coarse space nodal basis functions ψ1h0 (x). There exists C > 0 independent of h. . The proof follows by combining the seminorm bounds for |IG vh |21/2. We shall omit discussion of wirebasket coarse spaces.T is also denoted Ih0 .∂Ωi . . .∂Ωi . Ωp form a coarse triangulation Th0 (Ω) of Ω of size h0 .9 Theoretical Results 209 Proof.T (B) ⊂ Vh (B) is defined by:  n0 I0.∂Ωj . and the piecewise constant coarse space employed in the balancing domain decomposition pre- conditioner.

T vh |21.j without a h0 scaling factor.T in two dimensions. Proof. JO2]. if Ω ⊂ IR2 |I0.T vh |1/2. We shall only outline the proof of boundedness of I0. The interpolation error will satisfy: |vh − I0.T vh |20. .96. Under the same assumptions as Lemma 3.∂Ωi .Ωi which in turn can be estimated by the discrete Sobolev inequality as bounded by C(1 + log(h0 /h))vh 21/2. . Let Th (Ω) be a quasiuniform triangulation of Ω of size h. CI2. we may multiply them by a factor ρi on each subdomain.Ωi when Hh is the discrete harmonic extension map into the subdomains. .97. Then. . Lemma 3. if Ω ⊂ IR2 ρi |I0.   Since each of the bounds in Lemma 3. Ωp form a quasiuniform coarse triangulation Th0 (Ω) of Ω of size h0 . if Ω ⊂ IR3 for C > 0 independent of h. Lemma 3.∂Ωi ≤ 2 (3.T (B) = Vh0 (B) ⊂ Vh (B).T vh |1/2. The interpolation error: |vh − I0.T vh |21/2.T vh |21. see [BR15.∂Ωi . and sum over all subdomains to obtain global estimates involving weighted terms.∂Ωi and |Hh I0. the following bound will hold for vh ∈ Vh (B): p p C (1 + log(h0 /h)) i=1 ρi vh 21/2.∂Ωi ≤ Ch0 |vh |21/2.124) C (h0 /h) vh 21/2.∂Ωi is standard [ST14.Ωi will involve the difference quotients:  (i) 2 alj (vh (xl ) − vh (xj )) l.T vh will be linear on each triangular subdomain Ωi . Employ the equivalence between |I0. Let Ω1 . the term |Hh I0. h0 and ρi . ∀vh ∈ Vh (B) (3. This yields the desired bound for Ω ⊂ IR2 . The following bound will hold locally on each ∂Ωi for vh ∈ Vh (B): C (1 + log(h0 /h)) vh 21/2. 2. in two dimensions.96. h0 and ρi .∂Ωi .T vh |21.Ωi can thus be estimated by C |vh |2∞.∂Ωi .∂Ωi . for 1 ≤ i ≤ p the following results will hold.T onto the standard coarse space V0. For the general proof. DR10].125) for C > 0 independent of h. 1.∂Ωi ≤ Ch0 |vh |21/2. if Ω ⊂ IR3 for C > 0 independent of h.T vh |20. we summarize known bounds for the coarse grid interpo- lation map I0. as indicated below. .210 3 Schur Complement and Iterative Substructuring Algorithms In the following.96 are local.∂Ωi .∂Ωi ≤ 2 p i=1 C (h0 /h) i=1 ρi vh 21/2. h0 and ρi . The term |Hh I0. Since Hh I0.

by carefully choosing the partition of unity parameters dj (G).P vh = IG dj (G) Qj vh . Definition 3. Remark 3. denoted V0. We shall now turn to theoretical estimates of the interpolation map I0. if G ∩ ∂Ωj = ∅  {j:G⊂∂Ωj } dj (G) = 1. Fortunately.99.  We next consider the piecewise constant coarse space V0.126) is referred to as piecewise constant since the finite element functions within this space have constant values on nodes within each glob of B.P : V0. and its values on a glob G depend on the mean value of the function on the boundaries of adjacent subdomains. 3. ∂Ωi ds The piecewise constant coarse space. with t ≥ .P (B) ⊂ Vh (B).100.9 Theoretical Results 211 Proof. {l:G⊂∂Ωl } ρl 2 For 1 ≤ j ≤ p we define a map Qj : Vh (B) → IR by:  u ds Qj u = ∂Ωi .P .P (B) ≡ Range(I0. Unlike the traditional interpolation map I0. we let 0 ≤ dj (G) ≤ 1 denote non- negative partition of unity parameters which satisfy: dj (G) = 0.96 by the factor ρi and sum over all the subdomains.P (B) defined by (3.P is not local. We define an interpolation map I0.P is also denoted Q0 elsewhere in these notes. (3. .126) The interpolation map I0.P ). Given a glob G ∈ G. Definition 3. G⊂G {j:G⊂∂Ωi } where Qj and dj (G) are as defined in the preceding. The piecewise constant coarse space is then defined as the range of the interpolation map I0. The space V0.P (B) used in the balancing domain decomposition preconditioner. global norm bounds can be obtained which do not depend on {ρi } and furthermore do not deteriorate in three dimensions. The quantities dj (G) are typically defined by: ρtj 1 dj (G) =  t . Multiply the local bounds in Lemma 3.P : Vh (B) → Vh (B):   I0. and describe analogous estimates. is formally defined next as the range of an associated interpolation map I0.98.P .T the map I0.

∂Ωj . We follow the proof in [MA14.128) in the expression for IG (vh − I0.∂Ωi .P denote the operator defined earlier based on the globs G ∈ G:   I0. {j:G⊂∂Ωj } Applying the generalized triangle inequality to the above expression yields:  |IG (vh − I0. we obtain:  IG (vh − I0. 2 2 2.∂Ωi ≤ L dj (G)2 IG (I − Qj )vh 21/2. Proof.P vh ) and using that IG1 IG2 = 0 whenever G1 and G2 are distinct globs.129) ≤ C (1 + log(h0 /h)) {j:G⊂∂Ωj } dj (G) |vh |1/2. If glob G ⊂ ∂Ωi . then: p ρi |(I − I0. Then.∂Ωi 2 (3.127) G∈G {j:G⊂∂Ωj } where  I = G∈G IG  (3. Let I0. 1.101. If the partition parameters dj (G) based on the globs are defined by: ρtj dj (G) =  . Suppose the following assumptions hold. . . 1.P vh ) = dj (G)IG (I − Qj )vh . MA17.P vh )|21/2. MA15]. .∂Ωi ≤ IG (I − Qj )vh 21/2. for G ⊂ G {l:G⊂∂Ωl } ρtl for t ≥ 12 . the following bounds will hold for vh ∈ Vh (B). then: IG (I − I0.∂Ωi i=1 2 p (3. Let Ω1 .∂Ωj ≤ c2 IG (I − Qj )vh 21/2.128) 1 = {j:G⊂∂Ωj } dj (G).P )vh 21/2. 2.127) and (3.130) ≤ C L2 K 2 (1 + log(h0 /h)) i=1 ρi |vh |1/2. . C > 0 is independent of h.P vh ≡ IG dj (G)(Qj vh ). h0 and {ρj }.212 3 Schur Complement and Iterative Substructuring Algorithms Lemma 3. Substituting (3. {j:G⊂∂Ωj } Since G ⊂ ∂Ωi and G ⊂ ∂Ωj the following norms will be equivalent: c1 IG (I − Qj )vh 21/2. Ωp form a quasiuniform triangulation of Ω of size h0 . 2 In both of the above. (3.∂Ωi .∂Ωi . .P )vh |21/2.

∂Ωj ⎪ . The latter will in turn be equivalent to IG w21/2.∂Ωi .9 Theoretical Results 213 with 0 < c1 < c2 independent of h.∂Ωj and IG w2 1/2 will be H00 (G) 1/2 equivalent by definition of H00 (G). h0 and {ρl }.P vh )|21/2. {j:G⊂∂Ωj } Applying the glob theorem to the above yields: ⎧ ⎪ ⎪ |IG (vh − I0.∂Ωi ≤ c2 L dj (G)2 IG (I − Qj )vh 21/2.P vh )|21/2. Applying this norm equivalence yields:  |IG (vh − I0.∂Ωj . This will hold because of the compact support of IG w so that IG w21/2. 3.∂Ωi ⎨  ≤ q(h/h0 ) {j:G⊂∂Ωj } dj (G)2 (I − Qj )vh 21/2.

129). Since the seminorms are invariant under shifts by constants.∂Ωj 2 1 0 {j:G⊂∂Ωj } d j (G) 2 where q(h/h0 ) ≡ C c2 L (1 + log(h0 /h)) . using that Qj preserves constants and employing the scaling of seminorms under dilation) we obtain: (I − Qj )vh 20. Using a quotient space argument [CI2] (mapping ∂Ωj to a reference domain.130). we multiply (3.∂Ωj .∂Ωj :  |IG (vh − I0. h0 and {ρl }.∂Ωi ⎪ ⎨ 2  ≤ C c2 L (1 + log(h0 /h)) {j:G⊂∂Ωj } ρi dj (G) |vh |1/2.∂Ωj 2 2 (3.P vh )|21/2.∂Ωj ≤ C h0 |vh |21/2.P vh )|21/2. {j:G⊂∂Ωj } 2 where q(h/h0 ) ≡ C c2 L (1 + log(h0 /h)) .∂Ωj . for c3 > 0 independent of h.∂Ωj by |vh |21/2. ⎩ = q(h/h )  ⎪ |(I − Qj )vh |21/2. This yields (3.∂Ωj + h0 (I − Qj )vh 20. To obtain (3.129) by the factor ρi and rearrange terms: ⎧ ⎪ ⎪ ρi |IG (vh − I0. we may replace |(I − Qj )vh |21/2.∂Ωi ≤ q(h/h0 ) dj (G)2 |vh |21/2.131) ⎪ ⎪  ⎪ ⎩ = C c2 L (1 + log(h0 /h)) 2 ρ d (G) 2 {j:G⊂∂Ωj } i j ρj ρj |vh |21/2. When G ⊂ (∂Ωi ∩ ∂Ωj ) the following bound may be obtained for dj (G): ρ2t j ρ2t j ρ2t j dj (G)2 = .∂Ωj .

 2 ≤ ≤ t 2 2t . ρj ρj ρ2t i + ρj 1 + (ρj /ρi )2t . ρti + ρj ρ2t i + ρj {l:G⊂∂Ωl } ρtl which yields the following estimate for ρi dj (G)2 /ρj : ρi dj (G)2 ρi ρ2t j (ρj /ρi )2t−1 ≤ 1+2t = .

78. Lemma 3. 2 {G⊂∂Ωi } 0 {j:G⊂∂Ωj } Summing over all subdomain boundaries ∂Ωi yields:  p 2  p ρi |vh − I0. SM2] for theoretical estimates of the wirebasket interpolation map I0. Let the assumptions in Lemma 3. Neumann-Neumann [BO7]. the preceding expression will be uniformly bounded when 2t ≥ 1.4 Two Subdomain Preconditioners for S As an application of the preceding theoretical results.∂Ωj . Proof. .214 3 Schur Complement and Iterative Substructuring Algorithms Since the factor (ρj /ρi ) is positive.∂Ωi ≤ C c2 K L (1 + log(h0 /h)) 2 2 ρj |vh |21/2. 3.∂Ωi .P )vh .∂Ωi ⎪ ⎪ ⎩≤Cc K L (1 + log(h /h)) 2 ρj |vh |21/2.P vh |21/2. {G⊂G} This yields the bound: ⎧ ⎪ ⎪ ρi |vh − I0.P vh |21/2.∂Ωj . CH2.P vh = IG (vh − I0.131) yields: ρi |IG (vh − I0. Substituting this upper bound into (3. DR10. we estimate the con- dition number of the two subdomain Dirichlet-Neumann [BJ9.101 and the triangle inequality to I0.9.∂Ωi ⎨  ≤ ρi K {G⊂∂Ωi } |IG (vh − I0.P vh ). Follows immediately by an application of Lemma 3. with an upper bound of one.∂Ωi we employ the property of IG on ∂Ωi :  vh − I0.P vh )|21/2. GO3.132) i=1 i=1 for C > 0 independent of h. and the fractional Sobolev norm precon- ditioners [DR. (3. for the Schur complement. on ∂Ωi .P vh )|21/2. FU.∂Ωj . The reader is referred to [BR15.W .∂Ωi 2  ≤ C c2 L (1 + log(h0 /h)) {j:G⊂∂Ωj } ρj |vh |1/2. 2 To estimate |vh − I0. i=1 j=1 which is the desired bound (3.101 hold. Then:  p 2  p ρi |I0.P vh = vh − (I − I0.∂Ωi ≤ C L K (1 + log(h0 /h)) 2 2 ρi |vh |21/2.P vh .   As an immediate corollary. h0 and {ρj }.130).102. BR11. MA29]. BJ9.103.   Remark 3.P vh |21/2. Such esti- mates can be obtained by applying Lemma 3. BR11]. we obtain the following bounds for I0.P vh |21/2.

∀vB = 0. Suppose F represents the fractional Sobolev norm energy: vTB F vB = |vh |2H 1/2 . M corresponds to a Neumann-Neumann preconditioner.104.9 Theoretical Results 215 Lemma 3. for some β3 > 0 independent of h and {ρ1 . The desired result follows immediately. −1 −1 3. Follows from Lemma 3. Let the coefficient a(x) = ρi in Ωi for i = 1. BR11] from (3. the following bound will hold for vh ∈ Vh (Ω) with associated nodal vector vB on B = ∂Ω1 ∩ ∂Ω2 : ci ρi |vh |2H 1/2 (B) ≤ vTB S (i) vB ≤ Ci ρi |vh |2H 1/2 (B) . 00 then the preceding lemma yields that for vB = 0: vTB S (i) vB ci ρi ≤ ≤ Ci ρi . 00 00 for 0 < ci < Ci independent of h and {ρ1 . If M = S (i) . vTB F vB where S (i) denotes the subdomain Schur complement matrix. ρ2 }.   H00 (B) Remark 3. M is a Dirichlet-Neumann preconditioner. BJ9. 2. for some β2 > 0 independent of h and {ρ1 . By Lemma 3.105. ρ2 }. then M will be spectrally equivalent to F and satisfy: cond(M. If M −1 = α S (1) + (1−α)S (2) for some 0 < α < 1. Then. i.106.104 hold. ρ2 }. then: cond(M. Lemma 3.104. Proof..   . GO3.e. 2 and c(x) = 0 on Ω. ρ2 }. then: cond(M. Suppose the assumptions from Lemma 3. S) ≤ β3 .∂Ωi is norm equivalent to 1/2 |vh |2 1/2 by definition of H00 (B).64). S) ≤ β2 .. Proof. 1. ∀vh ∈ Vh (B). We have the following condition number estimates. Suppose that Th (Ω) is a quasiuniform triangulation of Ω and that neither Ω1 nor Ω2 is immersed. i. CH2. we obtain that S = (S (1) + S (2) )  (ρ1 + ρ2 ) F.78 since |vh |21/2. Let M denote any of the preconditioners [DR. S) ≤ β1 . 3.e. for some β1 > 0 independent of h and {ρ1 .

(3.∂Ωi ≤ S(uh .) as:  T    EuB AII AIB EvB S(uh . The additive Schwarz subspace preconditioners we shall consider will be based on the subspaces Vh (Gi ) ≡ Range(IGi ) ⊂ Vh (B) corresponding to globs Gi ∈ G.∂Ωl . 2. BR15.2 can then be obtained by applying the glob theorem and other theoretical tools described in this section. To study such convergence. ·) on Vh (Gi ) as defined below: Si (uh . and a coarse space V0 (B) ⊂ Vh (B). vh ) ≡ S(uh . The local bilinear forms A˜i (·. . 3. vB . the following equivalence will hold:  p  p c ρi |uh |21/2. vh ).5.. uB ATIB ABB vB where E ≡ −A−1 II AIB . ∀uh ∈ Vh (B). improved bounds may be ob- tained in some cases by employing other tools. . ∀uh .∂Ωi . vh ∈ Vh (B) defined on B with associated nodal vectors uB . uh ) ≤ C ρi |uh |21/2. . Gn for some n. .) ≡ S(.216 3 Schur Complement and Iterative Substructuring Algorithms 3. DR17. ·) : Vh (B) × Vh (B) → IR. We shall consider only the traditional coarse space V0.B ≤ S(uh . . Our estimates will be applicable when the coefficients {ρi } have large variation across subdomains.80. we shall employ the Schwarz subspace framework from Chap. TO10]. and the reader is referred to [BR12. 2.P ⊂ Vh (B). ·) in the abstract Schwarz framework of Chap.B ≡ ρl |uh |21/2. vh ) ≡ uTB SvB = .5. we shall use the notation:  p uh 21/2. . Given finite element functions uh .B . for the coarse space. when the variation in the coefficients is mild. DR10. The inner produce S(·.. we define the bilinear form S(.2 will be denoted Si (·.T (B) ⊂ Vh (B) and the piecewise constant space V0. Similarly.) defined later. .. endowed with the inner product A(.5. and a coarse space V0 ⊂ Vh (B). Estimates for the parameters K0 and K1 from Chap. uh ) ≤ C uh 21/2. 2.133) l=1 so that c uh 21/2. However. We also omit wirebasket preconditioners. For convenience.2 with the linear space V = Vh (B). By Thm. . that we shall employ in Vh (B) will be generated by the Schur complement matrix S. Subspaces Vi ⊂ V will be chosen as subspaces of the form Vh (G) ⊂ Vh (B) based on globs G ⊂ G. vh ∈ Vh (Gi ). i=1 i=1 For notational convenience.9. the globs in G shall be enumerated as G1 .5 Multi-Subdomain Preconditioners for S We now estimate the condition number of several multisubdomain Schwarz subspace preconditioners for the Schur complement matrix S.

we shall assume that exact solvers are employed for the submatrices so that the parameters ω0 = ω1 = 1 and K0 = C0 . we define an n × n matrix . 3.9 Theoretical Results 217 To simplify our discussion. If a coarse space is not employed.

= (.

ij ) of strengthened Cauchy-Schwartz parameters such that: S(vi . vj ) ≤ .

It is easily verified that the spectral radius ρ(. vi )1/2 S(vj .ij S(vi . vj )1/2 ∀ vi ∈ Vh (Gi ) and ∀ vj ∈ Vh (Gj ).

) of matrix .

is bounded by K L. matrix . When a coarse space is employed.

Then. Gn denote an enumeration of all the distinct globs in G so that the following decomposition of identity property holds:  n I= IGi . Due to the decomposition of unity property for the IGi . . ∀vh ∈ Vh (B). We shall estimate the partition parameter C0 in the weighted boundary norm (3. MA17]. By the abstract theory of Chap.133) instead of S(·. vi ) ≤ C0 S(vh . will be of size (n + 1) and its spectral radius will be bounded by (K L + 1) regardless of the choice of coarse space. . Lemma 3. the condition number of additive Schwarz subspace preconditioner for S will satisfy: C0 K L. vh ). and IGi : Vh (B) → Vh (Gi ) for 1 ≤ i ≤ n. 2. Let G1 . Since K and L are typically independent of h. With Coarse Space. given vh ∈ Vh (B) there exists vi ∈ Vh (Gi ) for 1 ≤ i ≤ n satisfying vh = v1 + · · · + vn and  p S(vi . h0 and {ρi }. ρmin with C independent of h. S) ≤ C0 (K L + 1). . Proof.107. Given vh ∈ Vh (B) define vi = IGi vh ∈ Vh (Gi ) for 1 ≤ i ≤ n. i=1 where I : Vh (B) → Vh (B).2. it will hold that: v1 + · · · + vn = vh . i=1 where   ρmax C0 ≤ C L (1 + log(h0 /h)) h−2 2 0 . . No Coarse Space cond(M. The next result yields an estimate for C0 when there is no coarse space. See [TO10.5. . ·) since both are equivalent. h0 and {ρi }. we only need to focus on the partition parameter C0 .

∂Ωi ≤ C Hh vh 21. i=1 Ωi Then. (3.Ωi .218 3 Schur Complement and Iterative Substructuring Algorithms If Gl ⊂ ∂Ωi .∂Ωi . ∀vi ∈ Vh (Ωi ) ∩ H01 (Ωi ). then by the glob theorem. an application of the trace theorem to Hh vh on ∂Ωi yields: ⎧ ⎨ vh 21/2. v) ≡ ρi ∇u · ∇v dx. where   p A(u.∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2. vi ) = 0. we obtain that: 2 |IGl vh |21/2.134) Define Hh vh as the discrete harmonic extension of the the interface value vh into the subdomain interiors Ωi for 1 ≤ i ≤ p A(Hh vh .

⎩ = C |Hh vh |21.Ωi .∂Ωi .Ωi + h12 Hh vh 20. and summing over all adjacent subdomains yields the following: p IGl vh |21/2.B = i=1 ρi |IGl vh |21/2.134). 0 Substituting the preceding bound into (3. multiplying by the factor ρi .

2 ≤ C (1 + log(h0 /h)) i:Gl ⊂∂Ωi ρ i |H h vh | 2 1.Ωi  .Ωi + 1 2 h0 H h vh 2 0.

Ωi . 2 1 0 Summing over all globs yields the estimate: n n p l=1 IGl vh |1/2.B = i=1 ρi |IGl vh |1/2.Ωi + h2 Hh vh 20. 2 ≤ C (1 + log(h0 /h)) ρmax i:Gl ⊂∂Ωi |Hh vh |1.∂Ωi 2 2 l=1 .

Ωi 2 1 2 2 p .Ωi + h20 Hh vh 0. 2 n  ≤ C (1 + log(h0 /h)) l=1 i:Gl ⊂∂Ωi ρi |Hh vh |1.

≤ C (1 + log(h0 /h)) ρmax L i=1 |Hh vh |21.Ωi .Ωi + h12 Hh vh 20.

and substitute it in the preceding bound to obtain: n n p l=1 IGl vh 1/2.Ωi . Thm.B = i=1 ρi |IGl vh |1/2. 0 Since Hh vh is piecewise discrete harmonic.Ω .80 yields the equivalence: . we apply Poincar´e-Freidrich’s inequality: Hh vh 20.Ω ≤ C |Hh vh |21.∂Ωi 2 2 l=1 2 p (3. 2 1 0 Since Hh vh is zero on BD .Ω . 3.135) ≤ C (1 + log(h0 /h)) ρmax L (1 + h12 ) i=1 ρi |Hh vh |21. 0 2 = C (1 + log(h0 /h)) ρmax L |Hh vh |1.Ω + h2 Hh vh 20.

2. This upper bound may be unduly pessimistic when the factor (ρmax /ρmin ) is large. However. Let Ω ⊂ IRd for d = 2. h0 and {ρi }.135) yields:  n   2 ρmax 1 S(IGl vh . 3. i=1 for c. if V0 = V0. 3 C0 ≤ C (1 + log(h0 /h))2 . vh ). As an immediate corollary. h0 and {ρi }. we obtain the following condition number estimate for the block Jacobi Schur complement preconditioner in two or three dimensions:    2 ρmax 1 cond(M. 3. Let coarse space V0. Gn denote an enumeration of all the distinct globs in G so that the following decomposition of identity property holds:  n I= IGi .9 Theoretical Results 219  p c vh 21/2. ρmin h0 for some C > 0 independent of h. these bounds can be improved significantly. . i=0 where ⎧ 2 ⎪ ⎨ C (1 + log(h0 /h)) . . MA17]. 1. if V0 = V0.B ≤ S(vh . Let G1 . 3. . Let the following conditions hold. C independent of h. i=1 where I : Vh (B) → Vh (B).109. IGl vh ) ≤ C L (1 + log(h0 /h)) 1 + 2 S(vh . Our next estimate is for the Schur complement additive Schwarz precon- ditioner when a coarse space is included [TO10. given vh ∈ Vh (B) there exists v0 ∈ V0 and vi ∈ Vh (Gi ) for 1 ≤ i ≤ n with vh = v0 + v1 + · · · + vn satisfying:  p S(vi . .   Remark 3. h0 and {ρi }.B . vh ) = ρi |Hh vh |21.T and d = 3. if a suitable coarse space V0 is employed. S) ≤ C L (1 + log(h0 /h)) 1+ 2 .P (B) ⊂ Vh (B) be employed.T (B) ⊂ Vh (B) or V0.108.Ωi ≤ C vh 21/2. with C independent of h. Lemma 3. . vi ) ≤ C0 S(vh .T and d = 2 ⎪ ⎩ C (h0 /h).P for d = 2. Let the partition parameters dj (G) be defined for t ≥ 12 by: ρtj dj (G) =  . Substituting this into (3. if V0 = V0. and IGi : Vh (B) → Vh (Gi ) for 1 ≤ i ≤ n. vh ) ρmin h0 l=1 which is the desired result. {l:G⊂∂Ωl } ρtl Then.

∂Ωj .T : Vh (B) → V0.T (B) will be analogous. {j:Gl ⊂∂Ωj } where C > 0 is independent of h. for t ≥ 1/2.T depending on whether d = 2 or d = 3.B .∂Ωj 2 = C (1 + log(h0 /h)) L2 vh 21/2.P )vh 21/2.B 2 2 n   ≤ C (1 + log(h0 /h)) l=1 {i:Gl ⊂∂Ωi } {j:Gl ⊂∂Ωj } ρj |vh |1/2. Multiply the above expression by ρi and sum over all subdomains containing Gl to obtain:  {i:Gl ⊂∂Ωi } ρi IGl (I − I0.P (B).∂Ωj 2 2 p ≤ C (1 + log(h0 /h)) L2 j=1 ρj |vh |21/2. with differences arising from the bound for I0.∂Ωi {i:Gl ⊂∂Ωi } 2  ≤ C (1 + log(h0 /h)) {i:Gl ⊂∂Ωi } {j:Gl ⊂∂Ωj } ρj |vh |1/2. We shall only outline the proof for the choice V0 = V0.P vh 21/2. the following bound can be obtained.P )vh 1/2. ρj ρj ρ i + ρ j Substitution of the above into (3. h0 and {ρj }.P vh ) when Gl ⊂ ∂Ωi .B .129) yields: 2  IGl (I − I0. By construction. Given vh ∈ Vh (B). bound (3.P )vh 1/2.{ρi }.∂Ωi ≤ C (1 + log(h0 /h)) dj (Gl )2 |vh |21/2.∂Ωj .220 3 Schur Complement and Iterative Substructuring Algorithms Proof. 2 Summing over all globs Gl yields: n l=1 IGl (I − I0.132) from the preceding section yields that: 2 I0. as before: ρi dj (Gl )2 ρi ρ2t ≤ 2t j 2t ≤ 1.P vh where I0. it will hold that: vh = v0 + v1 + · · · + vn . Bound (3.136) When Gl ⊂ (∂Ωi ∩ ∂Ωj ).B ≤ C (1 + log(h0 /h)) vh 21/2.P )vh 21/2.∂Ωj . .P (B). define v0 ≡ I0.∂Ωi 2 2   ≤ C (1 + log(h0 /h)) {i:Gl ⊂∂Ωi } {j:Gl ⊂∂Ωj } ρi dj (Gl ) |vh |1/2. The choice V0 = V0. (3. For 1 ≤ i ≤ n define vi ≡ IGi (vh − I0 vh ).B  = ρi IGl (I − I0.P is the interpolation onto V0. To estimate vl = IGl (vh − I0.136) yields the bound: IGl (I − I0.P )vh 21/2.∂Ωj 2 2 2  ρi dj (Gl )2 = C (1 + log(h0 /h)) {i:Gl ⊂∂Ωi } {j:Gl ⊂∂Ωj } ρj ρj |vh |21/2.

. if V0 = V0. MA17. Since cond(M. ∂Ωp of B and either coarse space V0.P vh and vl = IGl (I − I0. .P will yield similar bounds as the vertex space preconditioner.132) yields: n I0. .B + l=1 IGl (I − I0. Lemma 3. depending only on the amount β of overlap.P . The preceding result yields logarithmic bounds. Let the assumptions in Lemma 3. while C (1 + (h0 /h)) will hold in three dimensions. .B . WI6. then the bound C (1 + log(h0 /h)) will hold in two dimensions. BR12. edge and face globs (in three dimensions) or their extensions. DR10. if V0 = V0.T or V0. • The BPS preconditioner in two dimensions is an additive Schwarz subspace preconditioner based on the edge and vertex globs. we combine the upper bound K1 ≤ M L with preceding bounds for K0 = C0 to obtain the desired result. DR10]. Proof. based on the vertex. is also an additive Schwarz subspace preconditioner. S) ≤ C K L (1 + log(h0 /h))2 . .T and d = 2 ⎪ ⎩ C K L (h0 /h) . wh ) and wh 21/2. 3. Readers are referred to [BJ8. • The Schwarz subspace preconditioner for S based on the overlapping sub- regions ∂Ω1 . . the desired result follows by equivalence between S(wh . DR17.P . BR15. BR13. TO10] for additional theory. then the bound C (1 + log(h0 /h)) will hold in two and three dimensions.P vh 21/2. and a coarse space V0. Improved bounds independent of h0 and h can be proved.110. BR14. 3 cond(M. .P )vh 21/2. Then: ⎧ 2 ⎪ ⎨ C K L (1 + log(h0 /h)) .P is 2 employed. . when the coefficient a(x) is smooth.P and d = 2.T or V0.T and d = 3.109 hold. KL8. we estimate the condition number of the additive Schwarz preconditioner for S based on Vh (G1 ). and V0.{ρi }.9 Theoretical Results 221 Combining the above bound with (3. Vh (Gn ). .T or V0. if V0 = V0. see [SM. BJ9. • The vertex space preconditioner in two or three dimensions.T is 2 employed.B .   The preceding lemma may be applied to estimate the condition number of several Schur complement preconditioners in two and three dimensions.P )vh for 1 ≤ l ≤ n. If coarse space V0. Since v0 = I0. DR14] and [DE3. S) ≤ K0 K1 .B 2 ≤ C (1 + log(h0 /h)) 1 + L2 vh 21/2.   As a corollary. If coarse space V0. XU10. BR11.

v) = uT Sv. v) ≡ (Su. The number of nodes on B will be denoted n and the number of nodes on B (i) = ∂Ωi \BD will be denoted ni . MA17]. We shall employ the following notation in our discussion. ∀ u. ∀ u.222 3 Schur Complement and Iterative Substructuring Algorithms 3. we define a semi-norm: . v ∈ IRn . DE3. The Euclidean inner product on IRn will be denoted: (u. DR18. TO10] for general convergence estimates on Neumann-Neumann preconditioners. v) = uT v. v ∈ IRn . We refer the reader to [DR14. by estimating the condition number of the balancing domain decomposition preconditioner using an algebraic framework introduced in [MA14. We also employ the inner product generated by the Schur complement S: S(u.9.6 Balancing Domain Decomposition Preconditioner We conclude our discussion on bounds for Schur complement preconditioners. On each non-Dirichlet boundary segment B (i) .

|wi |2S (i) ≡ S (i) wi . wi . The following Cauchy-Schwartz inequality will hold: . for wi ∈ IRni .

.

1/2 .

even when the Schur complement matrices S (i) is singular. Indeed. For each subdomain. ui S (i) vi . vi for all ui . vi ≤ S (i) ui . When the subdomain stiffness matrix S (i) is singular. (S (i) )1/2 ui (S ) vi . such a Cauchy-Schwartz inequality follows from the Euclidean Cauchy-Schwartz in- equality since the fractional powers (S (i) )α are well defined for α ≥ 0 because S (i) is symmetric positive semidefinite: (i) S ui . 3. (S (i) )1/2 vi 1/2 (i) 1/2 = S (i) ui . vi = (S (i) )1/2 ui . For each subdomain.7) denote the ni × n matrix which restricts a nodal vector on B to its subvector corresponding to nodes on B (i) . ui S vi . let Ri (same as RB (i) in Chap. vi . vi ∈ IRni . 3. let Di denote a diagonal matrix of size ni with positive diagonal entries such that the following identity holds: . whose column space (i. 1/2 S (i) ui .7) an ni × di matrix: Kernel(S (i) ) ⊂ Range(Zi ). range) contains the null space of S (i) . we shall denote by Zi (identical to Ni in Chap. ∀ui . vi ∈ IRni .e. (S (i) )1/2 vi 1/2 (i) 1/2 1/2 ≤ (S (i) )1/2 ui ..

111. Lemma 3.   . 1. ∀v ∈ Range(N ). ∀u. u). GO4]. 3. By definition. v ∈ IRn S(P0 u. we express the matrix form of the balancing domain decomposition preconditioner for S.9 Theoretical Results 223  p I= RTi Di Ri . The inverse M −1 of the balanced domain decomposition preconditioner is: M −1 = P˜0 + (I − P˜0 S)T (I − S P˜0 ). Proof. i=1 † where S (i) denotes the Moore-Penrose pseudoinverse of matrix S (i) . i=1 which we refer to as a decomposition of unity. Follows from the hybrid Schwarz description of the balancing domain decomposition preconditioner. 3. Consequently.7. Define N (identical to matrix C in Chap. 3. v). P0 v).137) = P0 + (I − P0 )T S(I − P0 ). ∀u. Employing the above notation. u ∈ IRn S(P0 u. The following properties will hold. 2. We define P˜0 as the following n × n symmetric matrix: −1 T P˜0 = N N T SN N and P0 = P˜0 S. the following properties will hold: P0 P0 = P0 P0 (I − P0 ) = 0 S(P0 u. v) = S(u. P0 = P˜0 S corresponds to the S-orthogonal projection onto Range(N ). v) = S(u. We define T as the following n × n matrix:  p † T = RTi DiT S (i) Di Ri . P0 u) ≤ S(u.7) as the following n × d matrix. The preconditioned Schur complement matrix M −1 S will have the form: M −1 S = P˜0 S + (I − P˜0 S)T S(I − P˜0 S) (3. in Chap. where d ≡ (d1 + · · · + dp ): " # N ≡ RT1 D1T Z1 · · · RTp DpT Zp . where M −1 S will be symmetric in the S-inner product. see [ST13.

139) S (u. (3. ∀u ∈ IRn \{0}. By substituting (3. We will derive bounds for cond(M. It is then proved that γm = 1.140) S ((I − P0 )u. u = S (P0 u + (I − P0 )T S(I − P0 )u. u) = S(P0 u. γM } cond(M.138) λm where λm and λM denote the minimum and maximum values of the general- ized Rayleigh quotient associated with M −1 S in the S-inner product: S M −1 Su. γM }.224 3 Schur Complement and Iterative Substructuring Algorithms Since the preconditioned matrix M −1 S is symmetric in the S-inner prod- uct. u λm ≤ ≤ λM . u) (3. S (P0 u. (I − P0 )u) and substituting the bounds in (3. we obtain the following equivalent expression: S M −1 Su. P0 u) + S(T S(I − P0 )u. u . as the follow- ing result shows. P0 u) + S (T S(I − P0 )u.142) yields the estimates: S (P0 u.138) and (3.140) into (3. Employing the Pythagorean theorem: S(u. ∀u ∈ IRn \{0}. Lemma 3. u) = S (P0 u. (3. (3. P0 u) + S ((I − P0 )u. (3. (I − P0 )u) γm ≤ ≤ γM . . the following bound will hold: max{1. S) by estimating the extreme values of the generalized Rayleigh quotient of M −1 S in the S-inner product.137) into S M −1 Su. P0 u) + S((I − P0 )u. u) + S((I − P0 )T S(I − P0 )u. its condition number can be estimated as: λM cond(M. S) = .   Next an alternative expression is derived for S (T S(I − P0 )u. Following that. (I − P0 )u) for u = 0. as described in (3. (I − P0 )u) for parameters 0 < γm ≤ γM Then.141) min{1. (I − P0 )u) . S) ≤ . Readers are referred to [MA17] for additional details. u).139).112. γm } Proof. γm } ≤ ≤ max{1.142) = S (P0 u. λM may be simplified using the S-orthogonality of the decomposition u = P0 u + (I − P0 )u.114 proves a bound for γM . u) Estimation of the parameters λm . Lemma 3. (I − P0 )u) min{1. Suppose the following condition holds: S (T S(I − P0 )u.

Given u ∈ IRn define ui ∈ IRni as follows: † ui ≡ S (i) Dj Ri (I − P0 )u for 1 ≤ i ≤ p. 3.113. and define .9 Theoretical Results 225 Lemma 3.

ui . (I − P0 )u) . S(I − P0 )u) . (3. (I − P0 )u) ≤ S (T S(I − P0 )u. the lower bound γm = 1 will hold. substitute T = i=1 RTi DiT S (i) Di Ri and simplify as follows: (T S(I − P0 )u. |ui |2S (i) ≡ S (i) ui . S ((I − P0 )u. the following identity will hold:  p S (T S(I − P0 )u.144) Proof.. Then.143) i=1 Furthermore. (I − P0 )u) = |ui |2S (i) . for 1 ≤ i ≤ p. (3. i. (I − P0 )u) in the Euclidean p † inner product.143).e. express S (T S(I − P0 )u. ∀u ∈ IRn . To derive (3.

S(I − P0 )u p . p T T (i)† = i=1 Ri Di S Di Ri S(I − P0 )u.

† = i=1 S (i) Di Ri S(I − P0 )u. Di Ri S(I − P0 )u p .

insert I = i=1 RT i D i i in S ((I − P0 )u. for 1 ≤ i ≤ p. and applying the Cauchy-Schwartz inequality yields: . Ri (I − P0 )u). (I − P0 )) and expand to obtain: R S ((I − P0 )u. (I − P0 )u T = p (3. † † = i=1 S (i) S (i) Di Ri S(I − P0 )u. the vector S(I − P0 )u will be balanced. S (i) Di Ri S(I − P0 )u p = i=1 S (i) ui .145) and expressing the result in terms of ui . (I − P0 )u) p i=1 Ri Di Ri S(I − P0 )u. Therefore it will hold that Di Ri S(I − P0 )u ⊥ Kernel(S (i) ).145) = i=1 RTi Di Ri S(I − P0 )u. Substituting this into (3. To p derive a lower bound for S (T S(I − P0 )u. (I − P0 )u). (I − P0 )) = (S(I − P0 )u. (I − P0 )u p = i=1 (Di Ri S(I − P0 )u. so a property of the pseudoinverse † yields Di Ri S(I −P0 )u = S (i) S (i) Di Ri S(I −P0 )u. By definition of P0 . ui p = i=1 |ui |2S (i) .

(I − P0 )) p = i=1 (Di Ri S(I − P0 )u. Ri (I − P0 )u) p .226 3 Schur Complement and Iterative Substructuring Algorithms S ((I − P0 )u.

Ri (I − P0 )u p 1/2 p 1/2 ≤ (i) i=1 (S ui . (3. up ) : ui ∈ IRni . Ri (I − P0 )u) (i) p 1/2 p 1/2 i=1 |ui |S (i) ( i=1 RTi S (i) Ri (I − P0 )u. Ri (I − P0 )u p 1/2 (i) 1/2 ≤ i=1 S (i) ui .). S (i) ui ⊥ Range(Zi ) . 2. . 2 = Canceling common terms and squaring the resulting expression yields:  p S ((I − P0 )u. and simplify the resulting expression: .   Lemma 3. Ri (I − P0 )u p = i=1 S (i) ui . This yields the bound γm = 1.146) i=1 |ui |S (i) 2 (u1 . (3.. .143)..147) Proof.143) was employed.up )∈K\0 Then.. (I − P0 )) ≤ |ui |2S (i) = S (T S(I − P0 )u. . . S(I − P0 )u) employing (3. † = i=1 S (i) S (i) Di Ri S(I − P0 )u. (I − P0 )u) .. (I − P0 )u) 2 = p 1/2 1/2 i=1 |ui |S (i) (S(I − P0 )u. 1. Suppose the following assumptions hold. . the following estimate will hold: γM ≤ C. Then To estimate γM .. ui S Ri (I − P0 )u.114. γM corresponds to the maximum of the generalized Rayleigh quotient associated with T S on the subspace Range(I −P0 ) in the inner product S(. ui ⊥ Kernel(S (i) ). (I − P0 )u) i=1 where bound (3. Let K denote the following set: $ % K ≡ (u1 . expand substitute that S = j=1 RTj S (j) Rj . ui ) i=1 (S Ri (I − P0 )u. Let C > 0 be as defined below: p p i=1 |Ri j=1 RjT DiT uj |2S (i) C= sup p . p (T S(I − P0 )u.

9 Theoretical Results 227 p (T S(I − P0 )u. ui p . 3. S(I − P0 )u) = i=1 S (i) ui .

† = i=1 S (i) S (i) Di Ri S(I − P0 )u. ui p = i=1 S(I − P0 )u. RTi DiT ui p .

Rj ( i=1 RTi DiT ui ) . Rj RTi DiT ui p p = j=1 S (j) Rj (I − P0 )u. Rj ( i=1 RTi DiT ui ) p 1/2 ≤ j=1 S (j) Rj (I − P0 )u. RTi DiT ui p p = i=1 j=1 S (j) Rj (I − P0 )u. Rj (I − P0 )u (j) p p 1/2 S Rj ( i=1 RTi DiT ui ). p = i=1 ( j=1 RTj S (j) Rj )(I − P0 )u.

R j (I − P 0 )u) . 1/2 p ≤ j=1 (S (j) R j (I − P0 )u.

Rj ( i=1 RTi DiT ui )) T T . p p 1/2 p j=1 (S (j) Rj ( i=1 Ri Di ui ).

(I − P0 )u) j=1 |R j i=1 R T T i D u | 2 i i S (j) p . 1/2 1/2 p p = (S(I − P0 )u.

Given glob G ∈ (∂Ωi ∩ ∂Ωj ) define IG as the matrix of size ni × nj ji IG ≡ Rj IG RTi . let yj for 1 ≤ j ≤ ni denote the nodes on B (i) in the local ji ordering. S(I − P0 )u) ≤ | RTi DiT ui |2S (j) . S) ≤ C.144) and (3.  1/2 1/2 p p ≤ i=1 |ui |S (i) 2 j=1 |Rj i=1 Ri Di ui |S (j) T T 2 . . j=1 i=1 Applying (3. where C is defined in (3. . .141) with bounds (3. The following notation will be employed. which yields an upper bound with γM ≤ C. yields:  p  p (T S(I − P0 )u. we shall estimate C for a finite element discretization. Let x1 . it will hold that:  IG = I. if xi ∈ G (IG )ii = 0. we obtain the condition number estimate cond(M.146). See [MA17]. Next. By construction. (3.148) {G⊂G} (i) On each ∂Ωi .144) was applied to obtain the last line. where (3. let IG denote the following n × n diagonal matrix: 1.   By combining bound (3. . (I − P0 )u). S(I − P0 )u) ≤ C S((I − P0 )u. Canceling the common terms and squaring the resulting expression.146) yields (T S(I − P0 )u. if xi ∈ G. .147). For each glob G. xn denote the nodes on B.

228 3 Schur Complement and Iterative Substructuring Algorithms Expressing Rj RTi = Rj IRTi and substituting for I using (3.115.up )=0 i=1 |u i | S (i) yielding cond(M. ρti + ρtj Additionally:  Rj RTi Di ui = {G⊂∂Ωi ∩∂Ωj } Rj IG RTi Di ui  ji = {G⊂∂Ωi ∩∂Ωj } IG Di ui  ji ii (3. Let ui ∈ Vh (∂Ωi ) denote a finite element function with associated nodal vector ui ∈ IRni on B (i) .. Suppose the following assumptions hold. For K and L defined by (3.150) ρj ρj for ui ⊥ Kernel(S (i) ) and S (i) ui ⊥ Range(Zi )..122) and (3. (u1 .. {G⊂(∂Ωi ∩∂Ωj )} Diagonal matrix Di of size ni has the following representation:  Di ≡ ii di (G)IG .149) = {G⊂∂Ωi ∩∂Ωj } IG di (G)IG ui  ji = {G⊂∂Ωi ∩∂Ωj } di (G)IG ui . for t ≥ .. Let R > 0 be the bound in the following discrete harmonic extension: 1 ji 2 1 |IG ui |S (j) ≤ R |ui |2S (i) . then the following will hold: ρti di (G) ≤ . 2. {G⊂∂Ωi } where the scalars di (G) are defined by: ρti 1 di (G) ≡  t. (3. {l:G⊂∂Ωl } ρl 2 When G ⊂ (∂Ωi ∩ ∂Ωj ).148) yields:  ji Rj RTi = IG .121). Lemma 3. the following estimate will hold: p p i=1 |Ri RjT DiT uj |2S (i) sup p j=1 2 ≤ K 2 L2 R. 1. . S) ≤ K 2 L2 R.

150) yields:  ji |Rj RTi Di ui |S (j) ≤ {G⊂∂Ωi ∩∂Ωj } di (G)|IG ui |S (j)  ρti ji ≤ {G⊂∂Ωi ∩∂Ωj } ρti +ρtj |IG ui |S (j) t− 1 1  ρi 2 ρj2 ≤ {G⊂∂Ωi ∩∂Ωj } ρti +ρtj R1/2 |ui |S (i) ρ1/2 ≤ L R1/2 supρ>0 1+ρt |ui |S (i) ≤ L R1/2 |ui |S (i) . The parameters K and L are generally independent of h. h0 and {ρj }. We follow the proof in [MA14. {G:G⊂∂Ωi ∩∂Ωj } Applying the triangle inequality and employing assumption (3. If the assumptions in Lemma 3.   The condition number of the balancing domain decomposition system now follows immediately from the preceding result. MA17]. (3.115 hold. 3. S) ≤ C K 2 L2 (1 + log(h0 /h)) . then the balancing domain decomposition preconditioned system will satisfy: 2 cond(M. S) ≤ K 2 L2 R. when t ≥ 12 .116.149) it holds that:  ji Rj RTi Di ui = di (G)IG ui . T 2 Summing over the indices j yields:  p  p  p | Rj RTi Di ui |2S (j) ≤ K 2 max |Rj RTi Di ui |2S (j) . By Lemma 3. Substituting the above in (3. h and coefficients {ρj }. for some C > 0 independent of h0 .122) so that at most K terms of the form Rj RTi Di ui will be nonzero: p p 2 | i=1 Rj RTi Di ui |2S (j) ≤ i=1 |Rj Ri Di ui |S (j) T p ≤K i=1 |Rj Ri Di ui |S (j) . where the parameters K. depending only on .9 Theoretical Results 229 Proof.115. p Apply the generalized trian- gle inequality to estimate the term |Rj i=1 RTi Di ui |2S (j) and use assump- tion (3. L and R are as defined earlier.151) j j=1 i=1 i=1 By property (3. Theorem 3. Proof. the condition number of the balancing domain decomposition preconditioned system will satisfy the bound: cond(M.151) yields the desired result.

. 2 where we have employed the Poincar´e-Freidrich’s inequality in the last line. 2 due to the constraint ui ⊥ Kernel(S (i) ). 1) ∈ Kernel(S (i) ) and so a Poincar´e-Freidrich’s inequality of the following form will hold for ui 1 |ui |21/2.∂Ωi ≤ C |ui |2S (i) . KL8].∂Ωj and |IG ui |21/2.230 3 Schur Complement and Iterative Substructuring Algorithms the spatial dimension and the shape regularity properties of the subdomain decomposition.∂Ωj .  For additional details. . In either case. and subsequently applying the glob theorem. . Thus. A similar Poincar´e-Freidrich’s inequality will hold for ui if S (i) is not singular (since c(x) = 0 and due to zero ji Dirichlet values on a segment of ∂Ωi ).∂Ωi 2 ≤ C (1 + log(h0 /h)) |ui |21/2. ρi for some C > 0 independent of h0 .∂Ωi 2 1 ≤ C (1 + log(h0 /h)) ρi |ui |S (i) . h and {ρj }. IG vi will correspond to the finite element function IG ui restricted to ∂Ωj . Applying the equivalence between the scaled Schur complement ji energy ρ1j |IG ui |2S (j) and the fractional Sobolev boundary energy |IG ui |21/2. . h and {ρi }. MA17. . the equivalence between |IG ui |21/2. we only need to estimate parameter R.∂Ωj 1 2 ≤ C|IG ui |21/2. let ui denote a finite element function on ∂Ωi with associ- ated vector of nodal values ui ∈ IRni satisfying ui ⊥ Kernel(S (i) ). the reader is referred to [MA14.∂Ωi (due to support of IG ui on ∂Ωi ∩ ∂Ωj ). Accordingly. then (1. we arrive at the following estimates: ji ρj |IG ui |S (j) = |IG ui |21/2. with ∂Ωi ∩ ∂Ωj as the support. If matrix T S (i) is singular. Thus R ≤ C (1 + log(h0 /h)) for some C > 0 independent of h0 .

4. subject to the constraint that the local displacements be continuous across the subdomains. It is a Lagrange multiplier based iterative substructuring method for solving a finite element discretization of a self adjoint and coercive elliptic equation. FA16. 4. Chap.3 describes a projected gradient algorithm for determining the Lagrange multiplier flux variables in the FETI method. each subdomain solution is parameterized by a Lagrange multiplier flux variable which represents the Neumann data of each subdomain solution on the subdomain boundary. FA14. MA25. 4. In traditional substructuring. we describe the FETI method (the Finite Element Tearing and Interconnecting method) [FA2. Chap. . 4. FA16. each subdomain solution is parameterized by its Dirichlet value on the bound- ary of the subdomain. The Lagrange multiplier variables correspond to flux or Neumann data on the subdomain boundaries. resulting in a highly parallel algorithm with Neumann subproblems. Chap. in Lagrange multiplier substructur- ing. KL8]. Applications include elasticity. Our discussion is organized as follows. Both methods are based on the PCG method with a special coarse space and with local problems that impose constraints on the globs. Given a non-overlapping decomposition. shell and plate problems [FA2. the FETI method employs an extended energy functional asso- ciated with the self adjoint and coercive elliptic.4 describes the FETI-DP and BDDC variants of the FETI algorithm. It is obtained by weakening the continuity of the displacements across the subdomain boundaries. The FETI method then minimizes this extended energy.2 describes the Lagrange multiplier formulation associated with this constrained minimization problem.4 Lagrange Multiplier Based Substructuring: FETI Method In this chapter. By contrast. The global solution is sought by solving a reduced Schur complement system for determining the unknown Dirichlet boundary values of each subdomain solution.1 describes the constrained minimization problem underlying the FETI method. by solving a saddle point problem. Several preconditioners are outlined. based on a non-overlapping decomposition of its domain. FA15]. Chap. They yield identical convergence rates and provide advantages in parallelizability. The global solution is then sought by determin- ing the unknown Lagrange multiplier flux variable. FA15.

subject to the constraint that the local solutions match across the sub- domain boundaries. ⎪ ∀v ∈ H01 (Ω). . local solutions are sought on each subdomain which minimize an extended global energy.3) 2 l=1 By construction. Ωp of Ω. We also define the following subdomain forms and spaces: ⎧  ⎨ AΩl (ul . we define its internal boundary segments B (l) = ∂Ωl ∩ Ω and external boundary segments B[l] = ∂Ωl ∩ ∂Ω. and common interfaces Blj = ∂Ωl ∩ ∂Ωj .1) seeks u ∈ H01 (Ω) satisfying: ⎧ ⎨ A(u. vl ) − FΩl (vl ) . v∈H01 (Ω) where J(v) ≡ 21 A(v. where A(u. v) = F (v). v) ≡ Ω (a ∇u · ∇v + c u v) dx (4.1. we define an extended energy functional JE (·) as: p    1 JE (vE ) = AΩl (vl . .1 Constrained Minimization Problem: Continuous Case Consider the following self adjoint and coercive elliptic equation: −∇ · (a(x)∇u) + c(x)u = f (x). . 4. then it can be verified that J(v) = JE (vE ). The weak formulation of (4. Given a non-overlapping decomposition Ω1 . in Ω (4. (4. . In this section. . ∀vl ∈ HB 1 (Ωl ) where ⎪ ⎩ H 1 (Ω ) ≡ {v ∈ H 1 (Ω ) : v = 0 on B }. vp ) where each local function vl (·) ∈ HB 1 [l] (Ωl ). . if v ∈ H01 (Ω) and vl (·) ≡ v(·) on Ωl for 1 ≤ l ≤ p. on ∂Ω.1) seeks u ∈ H01 (Ω): J(u) = min J(v). [l] B[l] l l [l] Given a collection of subdomain functions vE = (v1 . . vl = vj need not match across common interfaces Blj = ∂Ωl ∩ ∂Ωj . vl ) ≡ Ωl (a ∇ul · ∇vl + c ul vl ) dx.1) u(x) = 0. we describe the constrained minimization formulation of a self adjoint and coercive elliptic equation. Given a non-overlapping decomposition of a domain. ∀ul . v) − F (v). based on a non- overlapping decomposition.2) ⎪ ⎩  F (v) ≡ Ω f v dx.1 Constrained Minimization Formulation The FETI method is based on a constrained minimization formulation of an elliptic equation. Generally. .232 4 Lagrange Multiplier Based Substructuring: FETI Method 4. . The minimization formulation of (4. We also describe the finite element discretization of the elliptic equation and its constrained minimization formulation. vl ∈ HB[l] (Ωl ) 1 ⎪ FΩl (vl ) ≡ Ωl f vl dx. yet JE (vE ) is well defined.

we expect that minimizing JE (vE ) in V0 will yield: JE (uE ) = min JE (vE ) (4. . define I∗ (l) ⊂ I(l) as the subindex set of interface segments of dimension (d − 1) when Ω ⊂ IRd . (4. . . we define the following constraint set of local functions: V0 ≡ {vE : vl = vj on Blj if j ∈ I(l).1. φj ).5) where A is symmetric positive definite with entries n Aij = A(φi . 1 ≤ l ≤ p}. Heuris- tically. If {φ1 . . Note that j ∈ I ∗ (l) if and only if l ∈ I ∗ (j). such that if Blj = ∅. . φn } denotes a nodal basis for Vh ∩ H01 (Ω). . 4. as in Fig.1 Constrained Minimization Formulation 233 Define I ∗ (l) ≡ {j : Blj = ∅}. while nodes on B (i) Ω1 Ω2 Ω3 Ω4 Ω5 Ω6 Ω7 Ω8 Ω9 Ω10 Ω11 Ω12 Ω13 Ω14 Ω15 Ω16 Fig. .2 Constrained Minimization Problem: Discrete Case Let Th (Ω) denote a quasiuniform triangulation of Ω with n nodes in Ω. 4. vh ) = F (vh ). but not both. and u denotes the displacement vector with uh (x) = i=1 (u)i φi (x) and f denotes the load vector. Let Vh denote a space of finite element functions on the triangulation Th (Ω) of Ω. 4.1. . . 4. up ) will satisfy ul = u on Ωl for 1 ≤ l ≤ p. A non-overlapping decomposition . we shall block partition the nodal unknowns on each subdomain as follows. .1. A finite element discretization of (4. for the desired solution u(.4) vE ∈V0 where uE = (u1 .1) will seek uh ∈ Vh ∩ H01 (Ω) such that: A(uh . for a chosen ordering of the nodes.). then either j ∈ I(l) or l ∈ I(j). . The FETI method employs a Lagrange multiplier formulation of a discrete version of this problem. Ωp of Ω. Additionally. Nodes in Ωi will be regarded as “interior” nodes in Ωi . the resulting discretization will yield the linear system: A u = f. with (f )i = F (φi ). ∀vh ∈ Vh ∩ H01 (Ω). Then V0 will consist of local functions which match across subdomains. choose I(l) ⊂ I ∗ (l) as a subindex set. then. and iteratively solves the resulting saddle point system using a preconditioned projected gradient method. . Heuristically. Given a nonoverlapping decomposition Ω1 . For 1 ≤ l ≤ p.

respectively.10): p A = i=1 RiT A(i) Ri p (4. The FETI algorithm solves (4. . The local displacements. The common interface will be denoted as B = ∪pi=1 B (i) . we obtain 0 = ∇J (u) = Au − f .5) by a constrained minimization reformulation of (4. We shall denote by (i) (i) (i) (i) uI ∈ IRnI and uB ∈ IRnB vectors of finite element nodal values on Ωi and B (i) . (4. Its transpose RiT will extend by zero a nodal vector on Ωi ∪ B (i) to the rest of Ω. In terms of nodal vectors. For discretizations of more general elliptic equations. Suppose A = AT > 0 and let u solve the linear system (4.  The constrained minimization problem employed in the FETI method is obtained by weakening the requirement that the subdomain finite element functions be continuous across the interface B.. The next result describes a minimization problem equivalent to (4. . i.e. . f i = I(i) . relating the local and global stiffness matrices and load vectors.7) f = i=1 RiT f i .234 4 Lagrange Multiplier Based Substructuring: FETI Method as “subdomain boundary” nodes. we shall let Z (i) denote an ni × di matrix whose columns form a basis for the kernel of A(i) : Range(Z (i) ) = Kernel(A(i) ).8) When A(i) is nonsingular. local stiffness matrices and load vectors will be denoted as:  (i)   (i) (i)   (i)  uI A II AIB f ui = (i) . .. When matrix A(i) is singular. A(i) = (i)T (i) . . At the critical point of J (·). and the number of nodes in Ωi and B (i) will be denoted (i) (i) (i) (i) as nI and nB . (4.5).9) v∈IR 2 Proof. we define Z (i) = 0 and set di = 0. Since A = AT > 0. (4. Ω i ⊂ Ω.) and F (. and by subsequently enforcing continuity across B as a constraint.1. for 1 ≤ i ≤ p. each local . such as the equations of linear elasticity.) based on the subdomains. Lemma 4. for v ∈ IRn . where J(v) ≡ v Av − vT f . Kernel A(i) may have dimension di up to six (for Ω ⊂ IR3 ).6) uB AIB ABB fB given a local ordering of the nodes. the critical point u will correspond to a minimum. We shall denote by Ri the restriction map which maps a nodal vector u ∈ IRn of nodal values on Ω onto its subvector ui = Ri u of size ni of nodal values on Ωi ∪ B (i) . When coefficient c(x) = 0 in (4. will yield the subassembly identity (3.5). Then u will minimize the associated energy functional: 1 T J(u) = minn J(v). then T the local stiffness matrix A(i) will be singular with 1 ≡ (1.1) and Ωi is floating. respectively. for the chosen local ordering of the nodes. with ni ≡ (nI + nB ). 1) spanning Kernel A(i) . Decomposing A(.5).

f B )T denote local loads. . .. . . where (i)T (i)T f i = (f I .1 Constrained Minimization Formulation 235 (i)T (i)T displacement vector vi = (vI . .. In the following. To determine an extended displacement vE whose components match on the interface B. By construction. a block matrix Z of size nE × d: ⎡ (1) ⎤ Z 0 ⎢ . . The FETI method also employs extended loads f E ≡ (f T1 . wTp ∈ IRnE . ⎥ Z≡⎣ . . . displacement and load vectors will have the following block structure: ⎡ (1) ⎤ ⎡ ⎤ ⎡ ⎤ A 0 v1 f1 ⎢ . ⎦ . . the extended stiffness matrices. Define wi = Ri v ∈ IRni and wE = wT1 . . 4. . ⎥ ⎢ . we introduce the extended energy functional JE (wE ) that corresponds to the sum of the local displacement energies. d1 } + · · · + min{1.. vTp )T of size nE = (n1 +· · ·+np ). where d = min{1.10) 0 A(p) vp fp Given matrices Z (i) of size ni × min{1. ⎦ .11) 0 Z (p) will also be employed. ⎥ ⎢ . . A(p) of size nE . f E ≡ ⎣ . based on the local stiffness matrices A(i) . and an extended block diagonal stiffness matrix AEE ≡ blockdiag A(1) . . . . ⎥ AEE ≡ ⎣ . Lemma 4. constraints are imposed on vE and an extended energy functional is minimized subject to these constraints. di } whose columns span the null space of A(i) with Range(Z (i) ) = Kernel(A(i) ).. (4. . . need not match with adjacent displacements on B (i) ∩B (j) . ⎦ . ⎦ (4.2. f Tp )T . vB )T in an extended global displacement vE ≡ (vT1 . Suppose the following assumptions hold for v ∈ IRn : T 1. vE ≡ ⎣ . dp }.

i=1 3. f B ∈ IRni define:  p T f= RiT f i ∈ IRn and f E = f T1 . for J(v) defined by (4. i=1 .12) 2 E Then. . . (4. f Tp ∈ IRnE . Let JE (wE ) denote the following extended energy functional: 1 T JE (wE ) ≡ w AEE wE − wTE f E . Given local load vectors f i = f I . . .7) for the stiffness matrix yields:  p T v Av = vT RiT A(i) Ri v = wTE AEE wE . The subassembly identity (4. T T (i) (i)T 2.9). Proof. it will hold that J(v) = JE (wE ).

Lemma 4. for i = 1.9). . The matrix M will be chosen so that the equation M wE = 0 enforces each admis- (i) (j) sible pair of local displacement vectors wB and wB to match on the nodes in B (i) ∩ B (j) . . By construction.9) can be expressed as a constrained minimiza- tion of the extended energy functional JE (wE ) within the constraint set V0 .13). . . By definition of subspace V0 .3. 3. . . A constrained minimization formulation of (4. the parametric representation wi = Ri v in terms (i) (j) of v ∈ IRn ensures that the nodal values of wB match with those of wB for nodes on B (i) ∩ B (j) . . . provided a matrix M can be constructed. p. We let nB denote the number of nodes on B = ∪pi=1 B (i) . . corresponding to nodal values of v at the specific nodes. T 4. .4. Let V0 and matrix M of size m × nE be as in (4.14) vE ∈V0 Then. Let wE = wT1 . An application of the preceding lemma will yield the desired result. (R1 v)T . . .13). the following results will hold: wi = Ri u. .13) Here M will be a matrix of size m × nE . . (Rp v)T (4. . Proof. . .5) can now be obtained. wTp : $ % T V0 ≡ : v ∈ IRn = {wE ∈ IRnE : M wE = 0}. the minimization problem (4. Suppose the following assumptions hold. i=1 i=1 i=1 Substituting these into J(v) and JE (wE ) yields the desired result. wTp denote the constrained minimum: JE (wE ) = min JE (vE ). . Let u denote the minimum of (4. . p T 1. f Tp . . When matrix M can be constructed. 2.236 4 Lagrange Multiplier Based Substructuring: FETI Method since wi = Ri v. (4. . We shall now describe how to construct a matrix M so that the representation V0 = Kernel(M ) holds in (4.  Construction of Matrix M .   p Remark 4. The subassembly identity for load vectors yields: ! p  p T p T T T v f =v Ri f i = (Ri v) f i = wTi f i = wTE f . the above equivalence between J(v) and JE (wE ) will hold only when the constraints wi = Ri v for 1 ≤ i ≤ p are satisfied. When f = i=1 RiT f i . Let f = i=1 RiT f i and f E = f T1 . such that wi = Ri v for 1 ≤ i ≤ p T if and only if M wE = 0 for wE = wT1 . the following parametric representation wi = Ri v will hold for 1 ≤ i ≤ p and for some v ∈ IRn .

we may require matching of nodal values of vl and vj at node xi for each pair of indices l.1 Constrained Minimization Formulation 237 Definition 4. xnB on interface B.16) so that M vE = M (1) v1 + · · · + M (p) vp where M (i) is of size m × ni . We describe two alternate choices of matrix M (not necessarily full rank). j ∈ W (xi ). Each row of matrix M must be chosen to enforce a constraint which matches two nodal values. . . 4. and the degree of a node xi denotes the number of distinct subdomain boundaries to which it belongs. it will be sufficient to select a subset of linearly dependent constraints so that all such matching conditions can be derived from the selected few constraints. having the block structure: " # M = M (1) · · · M (p) .15) ⎪ ⎪ ⎩ index xl . . j ∈ W (xi ). In principle. Given nodes x1 . However. this will typically yield redundant equations when degree (xi ) ≥ 3. Since . Each node xi ∈ B will belong to degree (xi ) distinct subdomain boundaries. In practice. This can be done by requiring that the difference between the nodal value of vl and vj be zero at node xi . B (j) ≡ local index of xl in B (j) . Here W (xi ) denotes the indices of all subdomains whose boundaries contain xi . There is much arbitrariness in the choice of matrix M . . for each pair of indices l.5. (4. we define: ⎧ ⎪ ⎪ W (xi ) ≡ {j : xi ∈ ∂Ωj } ⎨ degree (xi ) ≡ |W (xi )| (4.

if l. (4. T (i)T (i)T each vi = vI . each M (i) may further be partitioned as: (i) (i) (i) M (i) = [MI MB ] = [0 MB ]. Corresponding to each node xi ∈ B. vB corresponds to interior and boundary nodal values. Specifically. The matrices we shall construct will have their entries Mij chosen from {−1. +1}. selected based on the following observations. Then the continuity of vl and vj at node xi can be enforced as follows: . j ∈ W(xi ) let ˜li = index(xi . j ∈ W(xi ) we will require that the difference of the entries of vl and vj be zero at xi . There is arbitrariness in the choice of entries of M . there will be 12 degree(xi ) (degree(xi ) − 1) distinct pairs of subdomains which contain node xi . B (j) ). B (l) ) and ˜ji = index(xi .17) (i) The submatrix MI will be zero since the matching of boundary values does not involve interior nodal values. 0. For each l.

.

j ∈ W(xi ). and in the following. we shall require l < j. j are selected from W(xi ). if l. (l) (j) vB − vB = 0. By convention. describe two different choices of matrices M depending on how many index pairs l. . 0.18) ˜ li ˜ ji This will yield entries of M to be from {−1. (4. +1}.

for each such node xi and consecutive indices l < j from W(xi ) define the entries of M as: ⎧ (l) ⎪ ⎪ MB k. l. choice 2 is preferable for parallel implementation [FA14]. For each node xi .r i All other entries in the k’th row of M are defined to be zero. j) denote the numbering (between 1 and m) assigned to the constraint involving node xi and subvectors vl and vj .238 4 Lagrange Multiplier Based Substructuring: FETI Method Choice 1. For a two subdomain decomposition.19).r = 0. Choice 2. If l < j are consecu- tive indices in W(xi ). impose one constraint corresponding to each distinct pair l < j of indices in W(xi ).6. noting that l. i=1 By construction. . Consequently. In this case. An alternative choice of matrix M may be obtained as follows. if r = ˜j . . j ∈ W(xi ) need not be consecutive indices. .r ⎩ M (l) = 0. however. as may be verified by the reader. Remark 4. all such constraints will be linearly independent. matrix (1) (2) M will have the following block structure with MB = I and MB = −I: " # M = 0 I 0 −I . The entries of matrix M can be defined as in (4. choices 1 and 2 will coincide. matrix M will not be of full rank if degree(xi ) ≥ 3 for at least one node xi . However. . For each consecutive pair of such indices. and the total number m of constraints:  nB m= (degree(xi ) − 1) . i=1 2 In this case. For each node xi . if r = ˜ji ⎪ ⎪ B k. Consequently. In both cases. The actual entries of matrix M will depend on the ordering of the constraints used. all nodes on B will have degree two. . impose one constraint.19) ⎪ ⎪ M = −1. due to it being of full rank. Then. Choice 1 for M is easier to analyze than choice 2. the constraints will not be redundant. the constraint set V0 will satisfy: $ T % V0 = (R1 v)T . if r = ˜li (j) (4. and matrix M will be of full rank (with rank equal to m). arrange all the indices in W(xi ) in increasing order. In this case. (Rp v)T : v ∈ IRn = Kernel (M ) . Since there are degree(xi ) such indices. let k(i. there will be 12 degree(xi ) (degree(xi ) − 1) such constraints. provided all nodes on B are ordered identically in both subdomains. several of the constraints will be redundant if degree(xi ) ≥ 3. if r = li ˜ ⎪ ⎪ ⎪ ⎨ M (l) B k. ⎪ B k. so that:  nB 1 m= degree(xi ) (degree(xi ) − 1) . yield- ing a total of degree(xi ) − 1 constraints corresponding to node xi .r = 1.

(4. ∀vE ∈ V0 t=0 ⇔ ∇JE (uE ) ⊥ V0 ⇔ ∇JE (uE ) ∈ Kernel(M )⊥ ⇔ ∇JE (uE ) ∈ Range(M T ). we describe the saddle point system associated with the constrained minimization problem (4.22) M 0 λ 0 Proof. We may represent any vector in Range(M T ) in the form −M T λ for λ ∈ IRm . At the saddle point of the Lagrangian function. referred to as Lagrange multipliers. Suppose the following assumptions hold. By construction. for each choice of nonzero vector vE ∈ V0 consider the line x(t) = uE + t vE ∈ V0 for t ∈ IR.14). the FETI method reformulates (4. Applying the derivative test yields: + dJE (x(t)) + dt + = 0. Further. Let uE = uT1 . In the following result. Let M be a matrix of size m × nE of full rank m. dt +t=0 Since uE corresponds to the minimum of JE (·) in V0 . and since x(t) ⊂ V0 with x(0) = uE the function JE (x(t)) will attain a minimum along the line at t = 0. there will exist a vector λ ∈ IRm such that:      AEE M T uE fE = . whose saddle point (a critical point which is neither a local maximum nor a local minimum) yields the constrained minimum from its components.2 Lagrange Multiplier Formulation 239 4.21) 2. one for each constraint. Lemma 4.14). uTp ∈ IRnE denote the solution of: JE (uE ) = min JE (wE ) (4.14) as a saddle point problem (saddle point or Lagrange multiplier methodology is described in [CI4. GI3] and Chap. To verify the first block row of (4. It introduces new variables. . Then.20) wE ∈V0 where V0 ≡ {wE ∈ IRnE : M wE = 0}. referred to as the Lagrangian function.7. it associates a function. and the resulting system of equations can be solved to determine the constrained minimum. 10). (4.2 Lagrange Multiplier Formulation To determine the solution to the constrained minimization problem (4. · · · . T 1. its gradient with respect to the original and Lagrange multiplier variables will be zero. ∀vE ∈ V0 ⇔ ∇JE (uE ) · vE = 0.22). 4. it passes through uE when t = 0 with: + dx(t) ++ = vE .

22). We shall associate the following dual function with the Lagrangian function. Each λi in λ = (λ1 .9. .240 4 Lagrange Multiplier Based Substructuring: FETI Method Choosing −M T λ (the negative sign here is for convenience). we may define the class G of admissible Lagrange multipliers as: G ≡ {µ : Z T (f E − M T µ) = 0}.12. λm ) is referred to as a Lagrange multi- plier. it will represent an inter-subdomain flux.10. Definition 4.22). which yields the first block row of (4.22). . To verify the second block row of (4. This latter requirement can be equivalently stated as Kernel(M ) ∩ Kernel(AEE ) = {0}. Recall that Z of rank d satisfies: Range(Z) = Kernel(AEE ). Since matrix AEE may be singular. the above infimum could be −∞ if (f E − M T µ) ∈ Range(AEE ). we associate a Lagrangian function L(vE . µ). Using Z. and to require that matrix ATEE = AEE ≥ 0 be coercive on the null space V0 of M . µ . Remark 4. λ will not be uniquely determined. . then D(µ) > −∞. vE Remark 4. By definition. if µ ∈ G. To ensure solvability of (4. 11. for some λ ∈ IRm . Definition 4.11. There will be m Lagrange multipliers. Given µ ∈ IRm of Lagrange multipliers. µ) ≡ JE (vE ) + µT M vE (4. it is sufficient to require that M is an m×nE matrix of full rank m. note that since uE ∈ V0 . µ). the derivative test for the critical point of L(·. When M is not of full rank. we obtain: AEE uE − f E = ∇JE (uE ) = −M T λ. By construction. For µ ∈ IRm define the dual function D(µ): D(µ) ≡ inf L(vE . .2). µ) with the constrained minimization problem (4. ·) yields (4.8. one corresponding to each row of M which enforces one of the m constraints.23) = 12 vTE AEE vE − vTE f E + µT M vE .   T Remark 4. For vE ∈ IRnE define a function E(vE ): E(vE ) ≡ sup L(vE .22).20): L(vE . Since each λi is a dual variable to the Dirichlet data (see Chap. we obtain M uE = 0.

It is easily verified that since L(·. We say that (uE . if M vE = 0 E(vE ) = JE (vE ) . following [FA14] we describe an iterative algorithm for obtain- ing the solution uE and λ to saddle point system (4. Remark 4. Due to the block structure of matrices M and Z. 10 need to be modified. Definition 4. The term “saddle point” is motivated by the following property. λ) = D(λ) ≤ L(vE .15. FA14].14. λ) as vE is varied. λ) yields system (4. Let uET . λ) corresponds to a minimum of L(vE .22):  AEE uE + M T λ = f E (4. λ).25) M uE = 0. the saddle point (uE .16. µ) ≤ E(uE ) = L(uE . Suppose the following assumptions hold. ∀vE . and subsequently uE . So we define a class of admissible displacements vE as V0 : V0 ≡ {vE : M vE = 0} . By definition. that Z has rank d. traditional saddle point iterative algorithms from Chap.3 Projected Gradient Algorithm In this section.13.22).3 Projected Gradient Algorithm 241 Remark 4. ·) is linear in µ: +∞. 4. µ. . . the first order derivative test (differentiation with respect to vE and µ) for a critical point of L(vE . µ) as µ is varied. if M vE = 0.) if the following conditions are satisfied: L(uE . (4.24) When local stiffness matrix A(i) is nonsingular. T 1. and we discuss these modifications [FA15. Thus. The next result describes a system for determining λ. 4. In the next section. As mentioned before. Since matrix AEE may be singular. Z (i) = 0 and M (i) Z (i) = 0. λ) is a saddle point of the Lagrangian functional L(. We assume that if AEE is singular. we describe an algorithm for determining uE and λ. and to a maximum of L(uE .22). We define G ≡ M Z as a matrix of size m × d. we obtain: " # G = M Z = M (1) Z (1) · · · M (p) Z (p) . µ) at (uE . if vE ∈ V0 then we will have E(vE ) = JE (vE ) < ∞.. Lemma 4. λT denote the solution to the saddle point system (4.

the following results will hold for G = M Z defined by (4.24): 1.26) GT λ = g. where . and A†EE and (GT G)† denote Moore-Penrose pseudoinverses. where P0 ≡ I − G(GT G)† GT . 2. Given λ. The Lagrange multiplier λ will solve the following reduced system: P0 K λ = P0 e (4. e ≡ M A†EE f E . g ≡ Z T f E . the displacement uE can be determined as follows: ⎧ ⎨ uE = A†EE f E − M T λ + Zα.242 4 Lagrange Multiplier Based Substructuring: FETI Method Then. K ≡ M A†EE M T .

When the compatability condition GT λ = g is satisfied.25) yields: ⎧ ⎪ ⎪ AEE uE = f E − M T λ ⇐⇒ f E − M T λ ∈ Range(AEE ) ⎪ ⎨ ⇐⇒ f E − M T λ ⊥ Kernel(AEE ) ⎪ ⎪ ⇐⇒ Z T f E − M T λ = 0 ⎪ ⎩ ⇐⇒ GT λ = g. Since AEE is singular.28) GT λ = g. it follows that Rank(G) = d and P0 is an orthogo- nal projection onto a space of dimension m − d. This effectively constitutes . and A†EE is the Moore- Penrose pseudoinverse of AEE . where g ≡ Z T f E . the general solution to the singular system AEE uE = f E − M T λ will be: uE = A†EE f E − M T λ + Zα. Proof. which corresponds to the Euclidean orthogonal projection onto Range(G)⊥ : P0 Kλ = P0 e GT λ = g.27) ⎩ α = (GT G)† GT Kλ − M A†EE f E . GO4]. Applying the constraint M uE = 0 to the above expression for uE yields: M A†EE f E − M T λ + M Zα = 0.28) can be eliminated by ap- plying P0 = I − G(GT G)† GT . The term Gα in the first block equation in (4. Combining the compatability condition with the preceding yields the system: K λ − Gα = e (4. This corresponds to K λ − Gα = e. the first block row in (4. Here α ∈ IRd is arbitrary. for K ≡ M A†EE M T and e ≡ M A†EE f E . Since d = Rank(Z). since matrix Z has rank d. which constitutes m + d equations for the m + d unknown entries of λ and α. see [ST13. (4.

When matrix AEE is nonsingular. it will be solvable by a conjugate gradient method in that subspace. the unknown coefficient vector α can be determined using (4. However.3. the FETI method seeks the Lagrange multiplier variables λ ∈ IRm by solving: P0 K λ = P0 e (4. and g = 0. 4.   Remark 4.30) are identical to the first and second block equations in (4. In this case. e = M AEE f E .29) with weights based on matrix C.30): C T P0 K λ∗ = C T P0 e (4.31) GT λ∗ = g. The first and third block equations in (4. . the coefficient matrix in the first block equation in (4. as outlined below. P0 = I.17. the reduced system (4.29) GT λ = g.30) ⎪ ⎩ GT λ = g. Remark 4. while the second block equation in (4.19 it is shown that this system is symmetric and positive definite within a certain subspace G∗ of IRm . Typically. Furthermore.29). both will be included for generality [FA14].29) as follows: ⎧ ⎪ ⎨ P0 K λ = P0 e C P0 K λ = C T P0 e T (4. Suppose λ∗ ∈ IRm can be found satisfying the 2nd and 3rd block equations in (4.1 Projected Gradient Algorithm to Solve (4. In Lemma 4.27) once λ is determined.3 Projected Gradient Algorithm 243 m = (m − d) + d equations for the unknown λ ∈ IRm . If Z (and hence G = M Z) has rank d. G = 0. 4. third block equation will have rank d. while the second block equation will be redundant consisting of q linear combinations of rows of the first block equation.30) is redundant. We now motivate a projected gradient algorithm to solve (4. Thus. K = M A−1 T −1 EE M .26) will correspond to the stationarity condition for a maximum of the dual function D(µ) associated with the Lagrange multiplier variables. then the orthogonal projection matrix P0 will have rank (m − d).18.25) can be obtained using (4. corresponding to linear combinations of the first block in (4.26) Since the solution to (4. however.30) will have rank (m − d). to include global transfer of information within the algorithm.30). and consequently.29). matrix Z will have zero rank and vector α can be omitted. Once λ is deter- mined by solving the above problem.28) as α ≡ (GT G)† GT Kλ − GT e . either matrix G = 0 or C = 0. Let C denote an m × q matrix having rank q where q < m. the FETI method solves a modified linear system equivalent to (4. Employing matrix C we modify system (4.

30) as λ = λ∗ + λ ˜ solves: ⎧ ⎪ ˜ = P0 (e − K λ∗ ) P0 K λ ⎨ C P0 K λ T ˜=0 (4. ˜ within the subspace G0 ⊂ IRm defined by: If we seek the correction λ   G0 ≡ µ ∈ IRm : C T P0 K µ = 0.32) ⎪ ⎩ T˜ G λ = 0.244 4 Lagrange Multiplier Based Substructuring: FETI Method ˜ provided λ Then.19 below. Importantly. we may seek the solution to (4.33) then. (4. ·): . GT µ = 0 . the second and third block equations in (4.30) will automatically hold. matrix P0 K will be symmetric and posi- tive definite in subspace G0 equipped with the Euclidean inner product (·. by Lemma 4.

.

˜ P0 K µ ˜ . for some c > 0. ˜ λ˜ ∈ G0 (4. ˜ µ) ˜ . ˜ = λ. Consequently. ˜ µ P0 K λ. ∀µ ˜ ∈ G0 . ∀µ.34) (P0 K µ. a projected conjugate gradient iterative method may be applied to determine λ ˜ within G0 so that: . ˜ µ) ˜ ≥ c (µ.

By applying the constraints (4. ∀µ ˜ ∈ G0 .36) GT G β ∗ + GT C γ ∗ = g. γ ∗ and δ ∗ : ⎧ T ⎪ T ⎨ G K (Gα∗ + Cβ ∗ ) + G Gµ∗ = G P0 e T C T K (Gα∗ + Cβ ∗ ) + C T Gµ∗ = C T P0 e ⎪ ⎩ GT (Gα∗ + Cβ ∗ ) = GT g. seek it as λ∗ = Gβ ∗ + Cγ ∗ where the coefficient vectors β ∗ ∈ IRd and γ ∗ ∈ IRq are to be determined. . ˜ µ P0 K λ.37) yields the following block system for β ∗ . we obtain the following block equations for β ∗ and γ ∗ : C T P0 KGβ ∗ + C T P0 KCγ ∗ = C T P0 e (4. represent P0 Kλ∗ as Kλ∗ + Gδ ∗ where δ ∗ ∈ IRd denotes an unknown ⊥ coefficient vector to be selected so that Kλ∗ + Gδ ∗ ∈ Range (G) : GT (Kλ∗ + Gδ ∗ ) = 0. (4. it will be advanta- geous to combine the computation of P0 Kλ∗ into the above system.31) holds. µ) ˜ . (4.37) Substituting P0 Kλ∗ = Kλ∗ + Gδ ∗ into (4.36) and applying the constraint (4. ˜ = (P0 (e − K λ∗ ). Accord- ingly.35) To determine λ∗ ∈ IRm such that (4. Rather than solve this system involving d + q unknowns.31) to Gβ ∗ + Cγ ∗ .

1. by definition P0 λ = λ and P0 µ = µ. The matrix P0 K will be symmetric in the subspace G∗ with: (P0 Kλ. Suppose the following assumptions hold. Also. (4. P0 Kµ) .35). choose µ. Define   G∗ ≡ µ ∈ IRm : GT λ = 0 . µ) = (Kλ. Let M be of full rank. the following results will hold. P0 = I − G GT G GT and Range(Z) = Kernel(AEE ). 3. if GT µ = 0. satisfying: (P0 Kµ. P0 µ) ⎪ ⎪ ⎪ ⎪ = (Kλ. ∀wE such that Z T wE = 0. We next verify that P0 K is symmetric positive definite in G∗ . Let σ∗ (AEE ) be the smallest nonzero singular value of matrix AEE : wTE AEE wE ≥ σ∗ (AEE ) wTE wE . G = M Z. Then. The solution λ of (4. Lemma 4. P0 Kµ) . Let σ∗ (M ) be the smallest singular value of M . 2. Matrix P0 K will be positive definite.30) may now be expressed as λ = λ∗ + λ ˜ where λ ˜ solves (4. µ) ⎨ = (λ. Kµ) ⎪ ⎪ ⎩ = (λ. ·) denote the Euclidean inner product. µ) = (λ. Proof. λ ∈ G∗ . µ ∈ G∗ .3 Projected Gradient Algorithm 245 This system has the following block matrix form: ⎡ T ⎤⎡ ⎤ ⎡ T ⎤ G KG GT KC GT G β∗ G P0 e ⎣ C T KG C T KC C T G ⎦ ⎣ γ ∗ ⎦ = ⎣ C T P0 e ⎦ . Kµ) ⎪ ⎪ ⎪ ⎪ = (P0 λ. let (·. ∀λ. the solution λ∗ to problem (4. we obtain: ⎧ ⎪ ⎪ (P0 Kλ. µ) . µ) ≥ σ∗ (AEE ) σ∗ (M ) (µ. † 1. K = M A†EE M T .19.38) is solved. To show that P0 K is symmetric in G∗ . Since P0T = P0 and K T = K. . 2. 4.38) GT G GT C 0 δ∗ GT g Once system (4. 4. Then.31) is: λ∗ = Gα∗ + Cβ ∗ .

i. µ) = (Kµ. µ) ⎪ ⎪ ..246 4 Lagrange Multiplier Based Substructuring: FETI Method To verify positive definiteness. suppose that µ ∈ IRm \{0} and that GT µ = 0. Z T M T µ = 0.e. Then. applying the hypothesis yields: ⎧ ⎪ ⎪ (P0 Kµ.

µ . ⎪ ⎪ † ⎪ ⎨ = M A EE M T µ.

we shall seek its projection Qλ ∈ G0 in the form: Qλ ≡ λ + Gβ + Cγ.40) in the two special cases of interest. In the following.40) 0 GT G GT C 0 GT A pseudoinverse was employed in the above since in the cases of interest. . see (4. satisfying Q2 = Q) will be employed to project residuals or preconditioned updates onto the subspace G0 each iteration. (4.   Lemma 4. a projection matrix Q (possibly oblique. the PCG method may be employed to determine λ ˜ ∈ G0 . M T µ ⎪ ⎪ T ⎪ ⎪ ≥ σ (A EE ) M µ. µ) . γ ∈ IRq are chosen to satisfy: ⎧ T ⎨ G K (Gβ + Cγ) + G Gδ = −G Kλ ⎪ T T C T K (Gβ + Cγ) + C T Gδ = −C T Kλ ⎪ ⎩ GT (Gβ + Cγ) = −GT λ. The resulting projection Q will thus have matrix representation: ⎡ ⎤T ⎡ T ⎤† ⎡ T ⎤ GT G KG GT KC GT G G K Q ≡ I − ⎣ C T ⎦ ⎣ C T KG C T KC C T G ⎦ ⎣ C T K ⎦ . δ ∈ IRd was introduced to represent: P0 K (Gβ + Cγ) = K (Gβ + Cγ) + Gδ. To do this. By con- struction Qλ ∈ G0 . to ensure that all iterates and residuals in the conjugate gradient algorithm remain within the subspace G0 .19 shows that P0 K is symmetric and positive definite in G∗ . however. As a result. (4. and hence in G0 ⊂ G∗ . we describe the projection matrix (4.39) where the coefficient vectors β ∈ IRd . In the following. since Z T M T µ = 0 and since µ = 0. Given λ ∈ IRm . M µ T ⎪ ⎪ ∗ ⎩ ≥ σ∗ (AEE ) σ∗ (M ) (µ. we derive an expression for such a (possibly oblique) projection matrix Q. either C = 0 or G = 0.33). In the above. Care must be exercised. and this coefficient matrix will become singular. † ⎪ = A EE M T µ.

the FETI algorithm will typically have convergence rates deteriorating only mildly with increasing number of nodes per subdomain. Form of Q when c(x) ≥ c0 > 0. however.1) with c(x) = 0. only the action QN needs to be computed when the residuals from previous iterates lie in G0 . so that: λ∗ = G(GT G)−1 GT g. If the coefficient c(x) ≥ c0 > 0 in (4. In this case G = 0 and P0 = I. . ˜ . and a suitable precondi- tioner. Such transfer may be included in a suitably constructed preconditioner. will be nonsingular. For this choice of matrix C. the convergence rate may deteriorate only mildly with h. However. employing the projection matrices P0 and Q and a preconditioner. A preconditioner for P0 K can be sought within G0 . The nonhomogeneous term λ∗ can be sought as λ∗ = Gβ ∗ with: GT Gβ ∗ = GT g. While this may be viewed as an advantage. The algorithm below includes the computation of λ∗ and λ. 4. In this case Z. then the local stiffness matrices A(i) . When P0 K is suitably preconditioned. so that λ∗ = C(C T KC)−1 C T P0 e. the operator Q = I − G(GT G)−1 GT reduces to P0 and will be an orthogonal projection in the Euclidean inner product. it will be advantageous to include it by selecting a nontrivial matrix C ≡ M Z˜ where Z˜ is an nE × d matrix whose columns form a basis for Kernel(A˜EE ) where A˜EE = blockdiag(A˜(1) . In this case.1). the projection onto subspace G0 will provide global transfer of information. . In applica- tions. due to the block diagonal terms M (i) Z (i) in matrix G. then the subdomain stiffness matrix A(i) will be singular when Ωi is a floating subdomain. A˜(p) ) denotes the extended stiffness matrix arising from discretization of the elliptic operator in (4. In this case. If c(x) = 0 in (4. Computation of the initial nonhomogeneous term λ∗ reduces to: (C T KC) β ∗ = C T P0 e.1). and hence AEE . and typically C is chosen to be 0 (or equivalently omitted).3 Projected Gradient Algorithm 247 Form of Q when c(x) = 0. . will be nontrivial. whose inverse has the form QN QT (though in practice. it will be sufficient to evaluate only QN ) for a matrix N . so that the action of the inverse of the preconditioner has the form QN QT where N is symmetric (in the Euclidean inner product). We may now summarize the FETI algorithm. . and hence G = M Z. Importantly. operator Q = I − C(C T KC)−1 C T K and will be orthogonal only in the K induced inner product. it results in an algorithm without any built in mechanism for global trans- fer of information.

Compute the residual: r0 ≡ P0 (Kλ∗ − e).3. 5. Endfor 7. 4.1 (FETI Algorithm to Solve (4.26)) Let λ0 de a starting guess (for instance λ0 = 0) 1. 2. For k = 1.248 4 Lagrange Multiplier Based Substructuring: FETI Method Algorithm 4. · · · until convergence do: ⎧ ⎪ ⎪ zk−1 = N rk−1 preconditioning ⎪ ⎪ ⎪ ⎪ yk−1 = Qzk−1 projection ⎪ ⎪ ⎪ ⎪ ⎨ ξ k = rk−1 yk−1 T ⎪ pk = yk−1 + ξk−1 ξk pk−1 (p1 ≡ y0 ) ⎪ ⎪ ⎪ ν k = T ξk ⎪ ⎪ ⎪ pk P0 Kpk ⎪ ⎪ ⎪ ⎪ λ = λk−1 + νk pk ⎪ ⎩ k rk = rk−1 − νk P0 Kpk 6. Compute: . Compute:  e ≡ M A†EE f E g ≡ ZT f E 2. Solve the following system (using a pseudoinverse): ⎧ T ⎨ G K (Gβ ∗ + Cγ ∗ ) + G Gδ ∗ = G P0 (e − Kλ0 ) ⎪ T T C T K (Gβ ∗ + Cγ ∗ ) + C T Gδ ∗ = C T (P0 e − Kλ0 ) ⎪ ⎩ GT (Gα∗ + Cβ ∗ ) = GT g. Define: λ∗ ← λ0 + Gβ ∗ + Cγ ∗ . 3.

for matrix P0 K. Both the preconditioners considered below have a similar structure. a coarse space term will be unnecessary in FETI preconditioners. . α ≡ (GT G)† GT Kλ − M A†EE f E u = A†EE f E − M T λ + Zα.3. Since information will be transferred globally within the FETI algorithm in the projection step involving matrix Q.2 Preconditioners for P0 K We shall describe two preconditioners proposed in [FA15]. 4. We next describe preconditioners of the form Q N in the FETI algorithm. and are motivated as follows.

This for- mal analogy suggests that preconditioners can be sought for K having a similar structure to Neumann-Neumann preconditioners [FA14. (1) given a two subdomain decomposition the constraints will be MB = I and (2)  2 † MB = −I.41) ⎪ ⎪ MB (i)T AIB ABB (i) MB ⎪ ⎪ ⎪ ⎪ p ⎩ (i) † (i)T = i=1 MB S (i) MB . so that K = i=1 S (i) . Other preconditioners based on analogy with two subdomain Schur com- plement preconditioners are also possible. the resulting con- dition number will be independent of h. 0. then the (1)† † formal inverses of S and S (2) will be spectrally equivalent to each other (independent of h). 1} are interpreted as boundary restriction matrices Ri . it can be verified by employing the block structure of A(i) and the algebraic definition of the pseudoinverse of a matrix. 4. The last equation above follows easily when submatrix A(i) is nonsingular. For instance. KL8].3 Projected Gradient Algorithm 249 Since matrices AEE and M have the following block structures: AEE = blockdiag A(1) . i=1 (i) Each matrix M will have the following block structures due to the ordering of interior and boundary nodes within each subdomain: (i) (i) (i) M (i) = [MI MB ] = [0 MB ] (i) where MI = 0 since the continuity constraint involves only interface un- knowns. The additive expression for K in (4. Substituting this into the preceding expression for K yields: ⎧ p † ⎪ T ⎪ K = i=1 M (i) A(i) M (i) ⎪ ⎪ ⎪ ⎪  T     ⎨ (i) † p 0 (i) AII AIB 0 = i=1 (i)T (i)T (4. the heuristic similar- ity with Neumann-Neumann preconditioners suggests a preconditioner whose formal inverse has the structure: ⎧ . matrix K = M A†EE M T will formally satisfy:  p † T K= M (i) A(i) M (i) . · · · . we may heuristically define the action of the 2 inverse of a preconditioner for K by QN = Q i=1 S (i) . " # M = M (1) · · · M (p) . By construction. (i) heuristically. If matrix AEE is nonsingular. A(p) . More generally. provided the boundary constraint matrices MB with entries from {−1.41) resembles a subassembly identity. When A(i) is singular. In this case.

⎨ QN ≡ Q p M (i) S (i) M (i) T .

i=1 B .

42) ⎩ p (i) (i) (i)T (i)−1 (i) (i)T =Q i=1 MB ABB − AIB AII AIB MB . . B (4.

since computation of the action of . and the resulting preconditioner is referred to as a Dirichlet preconditioner.250 4 Lagrange Multiplier Based Substructuring: FETI Method In this case computing the action of QN will require the solution of a local Dirichlet problem on each subdomain.

The action QL of the inverse of an alternative preconditioner. (i) (i)T (i)−1 (i) ABB − AIB AII AIB requires the solution of a Dirichlet problem on each subdomain. is obtained as follows: . referred to as the lumped preconditioner.

λmin (QN P0 K) Proof. our notation differs from that in [DO. MA18. See [MA25. h and jumps in the coefficients: λmax (QN P0 K) 3 cond (P0 K. More general constraints are considered in [TO10]. 1. DO2. we describe two popular variants of the FETI method to solve the saddle point problem (4. To be consistent with preceding sections. The following bounds hold for the Dirichlet preconditioner. except for a family of chosen continuity constraints. MA19]. The FETI-DP (Dual-Primal) method solves a reduced version of (4. Theorem 4. The resulting preconditioned matrices have the same spectra.20. and improve upon the scalability of the FETI algorithm in three dimensions. yielding robust conver- gence.4 FETI-DP and BDDC Methods In this section. Both methods are CG based.22) while BDDC (Balancing Domain Decomposition with Constraints) corresponds to a primal version of FETI-DP [FA11. KL8]. we describe the reduction of system (4. Both methods work in a class of local solutions which are discontinuous across the subdomain boundaries. This preconditioner does not require the solution of local Dirichlet problems (i) and is obtained by approximating the local Schur complement S (i) ≈ ABB . prior to formulating the FETI-DP and BDDC methods. except for zeros or ones. DO. In the following. QN ) ≡ ≤ C (1 + log(h0 /h)) . p (i) (i) (i)T QL ≡ Q i=1 M B A BB M B . DO2.22) or its associated primal formulation.22) to a smaller saddle point system and introduce notation. we only consider simple continuity constraints across cross points. FA10. The following theoretical results will hold for the preconditioner Q N . ST4. There exists C > 0 independent of h0 .   4. For simplicity. MA19]. . MA18. edges and faces of a subdomain boundary.

4.4 FETI-DP and BDDC Methods 251 Reduced Saddle Point System.22) by elimination of (l) the interior unknowns uI for 1 ≤ l ≤ p.22). we re-order the block vector uE as T T T (uI . where: . uB ) in the saddle point system (4. To reduce system (4.

T .

⎥.. . while the load vectors satisfy: . . . This will yield the following reordered system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ AII AIB 0 uI fI ⎢ T T ⎥⎢ ⎥ ⎢ ⎥ ⎣ AIB ABB MB ⎦ ⎣ uB ⎦ = ⎣ f B ⎦ . T (1)T (p)T (1)T (p)T uI = uI . (4. ABB = ⎢ ⎦ ⎣ . . .. ⎥ . uI and uB = uB . ⎦ (p) (p) (p) 0 AII 0 AIB 0 ABB   with matrix MB = MB(1) · · · MB (p) . . .. AIB = ⎢ ⎦ ⎣ . .43) 0 MB 0 λ 0 where the block submatrices AII . AIB and ABB are defined as follows: ⎡ (1) ⎤ ⎡ (1) ⎤ ⎡ (1) ⎤ AII 0 AIB 0 ABB 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ AII = ⎢ ⎣ . . uB . . . ⎥ .

T T .

43) can be obtained by solving (4. The reduced saddle point system will be:      SEE MBT uB ˜ fB = . Substituting this expression into the second block row of (4. . ⎦ BB IB II IB 0 S (p) The solution to (4.43) yields a reduced saddle point system for determining uB and λ. the matrices AXY and MB .6) for X. T . . fI and f B = f B . Here MB and MB will be of size (l) (1) (p) m × nB and m × nB respectively. f X are as in (4. . and SEE of size nB = (nB + · · · + nB ).10) and (4. (4.43). (l) (l) (l) (l) Here. and vectors uX . . . We solve for uI = A−1 II (f I − AIB uB ) using the first block row of (4.44) MB 0 λ 0 f B ≡ f B − ATIB A−1 where ˜ −1 II f I and SEE = (ABB − AIB AII AIB ) satisfies: T ⎡ ⎤ S (1) 0 ⎢ ⎥ −1 SEE = ⎢ ⎥ where S (i) ≡ (A(i) − A(i) A(i) A(i) ). B. . Y = I. T T (1) (p)T (1) (p)T fI = fI . .44) for uB and λ. . ⎣ . and subsequently uI = A−1 (l) II (f I − AIB uB ). f B . ..

and U parameterizes the global degrees of freedom on B. . so that B can be further partitioned into globs. . . Let RE : U → W denote the restriction matrix from U into W: " # " #T RTE = RT1 · · · RTp and RE = RT1 · · · RTp . We employ the notation: • Let U = IRq be the space of nodal vectors associated with finite element functions on B in traditional substructuring. . 1). EnE . . • By construction Kernel(MB ) = Range(RE ). 1). . thus MB RE = 0. let Rl u denote the restriction of the vector u of nodal values on B onto the indices of nodes on B (l) . . We assume the cross points. . We let nX denote the number of distinct cross points and enumerate them as X1 . where MB vB denotes the jump discontinuity in vB across the subdomains. we heuristically define a face as any non-trivial segment int(∂Ωl ∩ ∂Ωj ) which can be mapped homeomorphically onto the open square (0. . (l) • Let Wl ≡ Range(Rl ) = IRnB denote the space of local nodal vectors associated with displacements on B (l) and let W ≡ (W1 × · · · × Wp ) be the (1) (p) space of extended local displacements with dim(W) = (nB + · · · + nB ). edges and faces when Ω ⊂ IR3 . and let B = ∪pl=1 B (l) be the interface. for vB ∈ W. 1). 1) × (0. We define a cross point as any endpoint in Ω of an edge. . . Ωp let B (l) = ∂Ωl ∩ Ω denote the interior segments of subdomain boundaries. We define a cross-point as an endpoint within Ω of an edge. . . . We shall assume that Ω1 . We let nE denote the number of distinct edges and enumerate them as E1 . . For 1 ≤ l ≤ p and u ∈ U. EnE . MB is of size m × nB and m ≡ dim(Λ) ≥ q also denotes the number of Lagrange multiplier variables. l=1 which is employed in traditional iterative substructuring. Here. such as cross points and edges for Ω ⊂ IR2 . . . Restriction and Extension Maps. q equals the number of nodes of the triangulation on B. FnF . Given Ω1 .252 4 Lagrange Multiplier Based Substructuring: FETI Method Primal and Dual Spaces. . Here. We let nX be the number of distinct cross points and enumerate them as X1 . . . We let nE denote the number of distinct edges and enumerate them as E1 . When Ω ⊂ IR3 . • Denote the primal Schur complement matrix S of size q × q as:  p S ≡ RE SEE RE = T RTl S (l) Rl . We assume that the cross points and edges partition B. we heuristically define an edge as any non-trivial segment int(∂Ωl ∩∂Ωj ) which can be mapped homeomorphically onto the open segment (0. XnX . . (l) Thus Rl will be an nB × q matrix with zero-one entries. . . . . . or cross points. where RE is a matrix of size nB × q. . . XnX . • Let MB : W → Λ. We let nF denote the number of distinct faces and enumerate them as F1 . We heuristically define globs such as cross points. in the following. Ωp are geometrically conforming. We define an edge as any non-trivial intersection in Ω of two faces int(F l ∩ F j ) which can be homeomorphically mapped onto the open interval (0. edges and faces. edges and faces partition B. When Ω ⊂ IR2 . .

Each row of Q0 will be associated with a distinct glob of B. In the FETI-DP and BDDC methods. • If the i’th row of Q0 is associated with a cross point Xl then: 1 if node j in B is the cross point Xl (Q0 )ij = 0 otherwise • If the i’th row of Q0 is associated with the edge El then: 1 if node j in B lies in El (Q0 )ij = |El | 0 if node j in B does not lie in El where |El | denotes the number of nodes in El .22. Thus. Define Q0 as an q0 × q matrix which maps onto the coarse degrees of freedom. Each row of Q0 will be associated with a distinct glob of B. in some chosen ordering of the globs. We define Q0 as an q0 × q matrix which maps onto the coarse degrees of freedom. More generally. one coarse degree of freedom will be associated with each distinct glob in B. the entries of the local mass matrix on the glob must be divided by its row sum. There will be as many coarse degrees of freedom or coarse basis functions as there are distinct globs in B. let q0 = (nF + nE + nX ) denote the number of coarse degrees of freedom on B. as follows. and one basis function with mean value one on each glob with zero nodal values outside the glob will employed in formulating the primal space. The above weights are uniform within each glob. then (Q0 u)i will be the mean value of u on the glob associated with row i. When Ω ⊂ IR2 . When Ω ⊂ IR3 . let q0 = (nE + nX ) denote the number of coarse degrees of freedom. 4. as described below. in some chosen ordering of the globs. for simplicity. Definition 4.21. and we shall assume that the interface B can be partitioned into distinct globs. if u ∈ U = IRq is a nodal vector of global degrees of freedom on B. • If the i’th row of Q0 is associated with cross point Xl then: 1 if node j in B is the cross point Xl (Q0 )ij = 0 otherwise • If the i’th row of Q0 is associated with the edge El then: 1 if node j in B lies in El (Q0 )ij = |El | 0 if node j in B does not lie in El where |El | denotes the number of nodes in El . edges and faces are referred to as globs. Definition 4.4 FETI-DP and BDDC Methods 253 The disjoint cross points. .

Since each coarse degree of freedom is associated with a distinct glob. Thus. only certain coarse degree of freedom will (i) be non-zero on B (i) . the entries of the local mass matrix on the glob must be divided by its row sum. We define a matrix Ci ≡ Rci Q0 RTi for 1 ≤ i ≤ p. Definition 4. then (Q0 u)i will be the mean value of u on the glob associated with row i. We also define C ≡ blockdiag(C1 . We then (i) define a restriction matrix Rci of size q0 × q0 as a matrix with zero or one entries which picks the coarse degrees of freedom which are non-zero on B (i) . with six faces. for simplicity.254 4 Lagrange Multiplier Based Substructuring: FETI Method • If the i’th row of Q0 is associated with the face Fl then: 1 if node j in B lies in Fl (Q0 )ij = |Fl | 0 if node j in B does not lie in Fl where |Fl | denotes the number of nodes in Fl . each glob either lies entirely within a subdomain boundary segment B (i) or does not lie in B (i) . (i) we define a restriction matrix Rci of size q0 × q0 as follows: 1 if glob j in the global ordering is l in the local ordering on B (i) (Ri )lj ≡ c 0 otherwise.25. . Using the restriction matrices Rci and the coarse degrees of freedom matrix (i) (i) Q0 . More generally. ⎥. we define a family of constraint matrices Ci of size q0 × nB that will be employed to formulate the primal and dual spaces. Definition 4. if u ∈ U = IRq is a nodal vector of global degrees of freedom on B. . . Remark 4. with four cross points and four edges. the weights are uniform within each glob. If Ω i ⊂ Ω ⊂ IR3 is a box. Here too.23. then there will be eight coarse degrees of freedom associated with ∂Ωi . For instance. Cp ) as the block diagonal matrix of size (1) (p) (1) (p) (q0 + · · · + q0 ) × nB (where nB = (nB + · · · + nB ): ⎡ ⎤ C1 0 ⎢ ⎥ C≡⎢ ⎣ . .24. ⎦ 0 Cp .. Let q0 denote the number of globs in B (i) . then there will be twenty six coarse degrees of freedom on ∂Ωi . and since by definition. Given a global ordering of the q0 globs (and associated (i) coarse degrees of freedom) on B and a local ordering of the q0 globs on B (i) . twelve edges and eight cross points. if Ω i ⊂ Ω ⊂ IR2 is a rectangle. .

. Definition 4. Thus. wB )T : Ci wB = 0 for 1 ≤ i ≤ p . and defined as the span of q0 local basis functions whose coarse degrees of freedom are continuous across the subdomains.. then wi will be zero at all the cross points in B (i) . . . Since RTi and Rci are matrices with zero or one entries. . The other degrees of freedom in W∗ may be discontinuous across the subdomain boundaries.e. Below. while the space WP . i. We define W∗ as the following subspace of W: $ % (1)T (p)T W∗ ≡ wB = (wB . WD . i..29. . (1) (p) Definition 4. The primal space WP will be employed as a coarse space. with at most one non-zero entry per row or column. 4. whose associated finite element functions are discontinuous across the subdomains. . We define a matrix Rc of size (q0 + · · · + q0 ) × q0 as: ⎡ c⎤ R1 ⎢ . involves global constraints. with mean value zero on the edges and faces (if any) in B (i) .27. The FETI-DP and BDDC methods employ several subspaces W0 .e. for each wB ∈ W∗ there must exist some u ∈ IRq0 such that C wB = Rc u. where WD is referred to as the dual space and involves local constraints. wB )T : C wB ∈ Range(Rc ) . The dual space WD ≡ Kernel(C) ⊂ W∗ will consist of local nodal vectors whose coarse degrees of freedom (mean value on each glob) are zero on each subdomain boundary:  (1)T (p)T (i)  WD ≡ Kernel(C) = wB = (wB . if Ci wi = 0. WP and W∗ of W.4 FETI-DP and BDDC Methods 255 Remark 4.. . The primal space WP will be a subspace of W∗ complementary to WD . Definition 4. then Ci wi will (i) compute the average value of wi on each of the q0 distinct globs on B (i) . in the local orderings. The space W∗ can be further decomposed as a sum of two spaces: W∗ = WD + WP . which is referred to as the primal space. we define W∗ ⊂ W as the space of local nodal vectors whose local coarse degrees of freedom are unique. ⎥ R ≡⎢ c ⎣ . . continuous across the subdomain boundaries. if wi ∈ Wi . Recall that W = (W1 × · · · × Wp ) denotes the space of nodal vectors on the boundaries. ⎦ ⎥ Rcp corresponding to a restriction of global coarse degrees of freedom on B onto the local coarse degrees of freedom onto each of the local boundaries B (i) .28. .26.

The FETI-DP and BDDC methods will employ the following property.45) C 0 µ 0 reduces to the solution of:      (i) ˜(i) S (i) CiT uB fB = for 1 ≤ i ≤ p. . Thus WP and WD are complementary: T T T W∗ = WP + WD . c Φp (l) where Φl is of size nB × q0 and Φ is of size nB × q0 with dim(Range(Φ)) = q0 . Remark 4. S (p) ) and C = blockdiag(C1 . wi ≡ (vB −Φi u) will satisfy Ci wi = 0 for 1 ≤ i ≤ p. µTp )T . . . if vB ∈ W∗ . By construction. ˜f B = (˜f B . λT )T to (4.32. . . Remark 4.30. by reordering the system. Thus. We define the primal space as WP ≡ Range(Φ) where: ⎡ ⎤ Φ1 ⎢ . . . vB )T . The subspace W0 ≡ Kernel(MB ) satisfies: W0 ⊂ WD ⊂ W∗ ⊂ W. The minimization of JB (vB ) = 12 vTB SEE vB − vTB ˜f B subject to the constraint that C vB = 0 can be reduced to p concurrent local problems. . .44) using a PCG algorithm to determine λ ∈ IRm .46) Ci 0 µi 0 If ˜f B = SEE wB and SEE is positive definite within WD . FETI-DP Method. ⎥ Φ≡⎢ ⎥ ⎣ . . each uB ∈ W∗ may be decomposed and sought in the form: uB = u D + Φ u c where C uD = 0 and Φ uc ∈ WP .. ⎦ where Ci Φi = Ri for 1 ≤ i ≤ p. The FETI-DP method seeks the solution (uTB . Then. Cp ). . . . . Henceforth. It is based on the decomposition uB = uD + Φ uc where wD ∈ WD with C w D = 0 and Φ uc ∈ WP . let (1)T (p)T (1)T (p)T vB = (vB . and µ = (µT1 . (4. . .31. ˜f B )T .44) by maximizing a dual function F(λ) associated with (4.256 4 Lagrange Multiplier Based Substructuring: FETI Method Definition 4. . Indeed. then there exists u ∈ IRq0 such (i) (i) that Ci vB = Rci u.33. . We recall f B ≡ f B − ATIB A−1 the saddle point problem (4. . . . yielding that (w1 . since SEE = blockdiag(S (1) . As a result.44) with ˜ II f I : . . . Remark 4. then it is easily verified that uB = PWD wB where PWD denotes the SEE -orthogonal projection onto WD . we assume that matrix SEE is positive definite within W∗ . wp ) ∈ WD . the solution to:      SEE C T uB ˜ fB = (4. .

47) MB 0 λ 0 The Lagrangian function associated with the above saddle point problem is: 1 T L(uB . uTc . µT . 4. In the following. λ) = u SEE uB − uTB ˜f B + λT MB uB .51) . λT )T results in the system: ⎡ ⎤⎡ ⎤ ⎡ ˜ ⎤ SEE CT SEE Φ MBT uD fB ⎢C ⎢ 0⎥⎢ µ ⎥ ⎢ 0 ⎥ ⎥ ⎥ ⎢ ⎢ 0 0 ⎥ ⎢ T ⎥⎢ ⎥=⎢ ⎥.49) ⎣ Φ SEE 0 ΦT SEE Φ ΦT MBT ⎦ ⎣ uc ⎦ ⎣ ΦT ˜ fB ⎦ MB 0 MB Φ 0 λ 0 The FETI-DP method solves the above system by solving a symmetric positive definite system for determining λ ∈ Λ = IRm by a PCG method. This will alter µ and λ. LT ≡ ⎣ 0 ⎦ .4 FETI-DP and BDDC Methods 257      SEE MBT uB ˜ fB = .50) L 0 λ 0 where the matrices K and L and the vectors x and g are as described next. The FETI-DP method seeks uB = uD + uP where C uD = 0 and uP = Φ uc . x ≡ ⎣ µ ⎦ . we express system (4.48) ⎣C 0 0 0⎦⎣ µ ⎦ ⎣ 0 ⎦ MB MB Φ 0 0 λ 0 Rearranging the unknowns as (uTD . λ) ≡ L(uD + Φ uc . (4. (4.49) more compactly as:      K LT x g = (4. uc . 2 B Since the constraint MB uB = 0 yields uB ∈ W0 ⊂ W∗ . µ. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ SEE CT SEE Φ MBT uD ˜ fB ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ K ≡ ⎣C 0 0 ⎦ . ΦT SEE 0 ΦT SEE Φ ΦT MBT uc ΦT ˜ fB (4. g ≡ ⎣ 0 ⎦. Seeking the saddle point of the augmented Lagrangian: Laug (uD . The constraint C uD = 0 can be imposed by augmenting the Lagrangian with (1) (p) the term µT CuD for µ ∈ IRq0 +···+q0 . λ) + µT CuD will yield the following saddle point system: ⎡ ⎤⎡ ⎤ ⎡ ˜ ⎤ SEE SEE Φ C T MBT uD fB ⎢ ΦT S T T T ⎥⎢ ⎥ ⎢ Φ MB ⎥ ⎢ uc ⎥ ⎢ ΦT ˜ ⎥ ⎢ EE Φ SEE Φ 0 fB ⎥ ⎢ ⎥⎢ ⎥=⎢ ⎥. but not uB . (4. we may alternatively minimize the functional within W∗ subject to the constraint MB uB = 0.

within the subspace C vD = 0. . .38. Remark 4. however. MA18. as follows: ⎡ ⎤⎡ ⎤ ⎡ ⎤ SEE CT SEE Φ uD g1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎣C 0 0⎦⎣ µ ⎦ = ⎣ 0 ⎦ (4. (4. D(p) ) : W → W be a discrete partition of unity: RTE D RE = I.35. Here. since C uD = 0. A system of the form K x = g can be solved by duality. Remark 4. By definition. In the following. see [DO.53). . it will hold that Sc = ΦT (I − PWD )T SEE (I − PWD )Φ. µT )T by solving (4.258 4 Lagrange Multiplier Based Substructuring: FETI Method The FETI-DP method seeks the solution to (4. we may determine (uTD . where ⎪ ⎪ ⎛ ⎪ ⎪  T  −1  ⎞ ⎪ ⎪ T ⎪ ⎨ Sc = ⎝ΦT SEE Φ − S EE Φ S EE C S EE Φ ⎠ 0 C 0 0 (4. since SEE and C are both block diagonal. FA10. we may solve the first two block rows above to obtain:    −1   uD SEE C T g1 − SEE Φ uc = . Let D = blockdiag(D(1) . Matrix Sc of size q0 can be shown to be sparse and can be assembled in parallel. The FETI-DP preconditioner F0 for F is chosen so that both the FETI-DP and BDDC preconditioned matrices have the same spectra. . and is indefinite. see [FA11. F λ = d arises as the condition for maximizing the dual function F(λ) = inf x Laug (x. matrix F = F T . Sc will be positive definite within WP . DO2. MA19] and Remark 4.52) ΦT SEE 0 ΦT SEE Φ uc g2 Given uc . we elaborate on the action of K −1 . However. we obtain x = K −1 (g − LT λ). it will be positive definite. MA19].53) µ C 0 0 Substituting this into the third block row yields the reduced system for uc : ⎧ ⎪ Sc uc = gc .33. where T the inverse of the preconditioner is MD SEE MD . matrix K can be verified to be positive definite. see Remark 4. By Remark 4. As a result. λ): F ≡ (LK −1 LT ) and d ≡ (LK −1 g). Once λ ∈ IRm is determined. .54) ⎪ ⎪ ⎪ ⎪  T  −1   ⎪ ⎪ SEE Φ SEE C T g1 ⎪ ⎪ ⎩ gc = g2 − .36.44) by eliminating x and by solving the resulting reduced system F λ = d for λ by a PCG method. Matrix K is a saddle point matrix. 0 C 0 0 Once Sc uc = gc is solved.34.

4. Substituting this.4 FETI-DP and BDDC Methods 259 where each D(l) : Wl → Wl is a diagonal matrix with non-negative diagonal entries. (4. wTc )T = 0 for λ = 0.36.l. Let ind(α. we will obtain that:  T    T wD SEE SEE Φ wD x Kx = T T . Remark 4. MA19] is a PCG method to solve the primal problem associated with system (4. We express F = (LK −1 )K(K −1 LT ) and for λ ∈ IRm let: ˜ T . MA18.j) ≡ 0 if α ∈ B (l) Let MD be a matrix the same size as MB defined by:   (1) (p) MD ≡ D∗ M (1) · · · D∗ M (p) . Then. we may seek uB = RE u for some u ∈ U = IRq .j) if α ∈ B (l) ∩ B (j) (D∗ )ind(α. x = (wTD . since C wD = 0. Diagonal dual weight matrices (l) D∗ : Λ → Λ. We define the diagonal dual matrix D∗ for all α ∈ B as: (l) (D(j) )ind(α.55) wc Φ SEE Φ SEE Φ wc The latter will be positive provided SEE is positive definite within W∗ (which we assume to hold) and provided (wTD .44). The inverse F0−1 of the FETI-DP preconditioner for F is: F0−1 ≡ MD SEE MD T =⇒ cond(F0 . see [RI5. DO2. j) denotes the row index in MB which enforces the matching between the local nodal values at α in B (l) and in B (j) . The BDDC method [DO. µ Then. and that ind(α. Recall that each row of MB is associated with a matching requirement between nodal values on two distinct subdomains. Suppose that a node α on B lies on B (l) ∩ B (j) . the primal problem associated with (4. Matrix F can be verified to be positive definite as follows. F ) ≤ c (1 + log2 (h0 /h)). FR]. it can be shown that MB MD T MB = MB and MD T MB + RE RTE D = I.56) The BDDC preconditioner S0 is formulated using the same coarse space and local saddle point problems employed in the FETI-DP method. Such weight matrices are employed in the BDDC method to average the solution on different subdomain boundaries. are defined based on the entries of the matrices D(j) as follows. Since MB uB = 0. KL10. each of size m. Matrix S0−1 S . (4. wTc )T = K −1 LT λ. BDDC Method. j) denote the index of the node α in the local (l) ordering in B (j) .44) can easily be verified to be the Schur complement system arising in traditional substructuring: Su=f where S ≡ (RTE SEE RE ) and f = (RTE ˜f B ). l.

In practice. C 0 G Rc ˆ then since C Φ = Rc . 0 C 0 0 Employing Remark 4. . can be assembled using either expression above. Then: (l) Aii (D(l) )ii =  (j) . for 1 ≤ i ≤ p. with D = blockdiag(D(1) . . based on the following subspaces of U: U0 = Range(RTE DT Ψ )   Ui = RTi D(i) wi : Ci wi = 0. i) the local index of node i in B (j) . D D The coarse matrix Sc . satisfying: RTE D RE = I. wi ∈ Wi . associated with continuous finite element functions on B. matrix Φ If we expand Ψ = Φ + Φ.260 4 Lagrange Multiplier Based Substructuring: FETI Method in the BDDC method has essentially the same spectrum as the preconditioned matrix F0−1 F in the FETI-DP method. .33. However. The BDDC preconditioner S0 for S corresponds to an additive Schwarz preconditioner with inexact solvers. {j:B (j) ∩B (l) =∅} Aj(i) j(i) The BDDC preconditioner also employs a coarse basis Ψ of size nB × q0 obtained by modifying the matrix Φ of size nB × q0 which satisfies C Φ = Rc :      SEE C T Ψ 0 = . The spaces Range(Ψ ) and W consist of nodal vectors associated with finite element functions which are discontinuous across the subdomain boundaries. it follows that Φ ˜ = −PW Φ and Ψ = (I − PW ) Φ. D(p) ). . ˆ will satisfy:      SEE C T ˆ Φ −SEE Φ = . ⎛  T  −1  ⎞ T S EE Φ SEE C S EE Φ Ψ T SEE Ψ = ⎝ΦT SEE Φ − ⎠ = Sc . the weighted averaging using RTi D(i) or RT DT yields nodal vectors in U. where each D(l) : Wl → Wl is a diagonal matrix with non-negative entries. The action S0−1 of the inverse of the BDDC preconditioner for S is defined as: . Let i denote the index of a node on B (l) and j(i) = ind(B (j) . the diagonal entries of D(l) are chosen as a weighted average of the diagonal entries of the stiffness matrices A(j) . except for zeros or ones. The BDDC preconditioner employs a discrete partition of unity matrix on B. C 0 G 0 ˆ and computing Ψ T SEE Ψ after algebraic simplification yields: Solving for Φ.

Thus. i=1 0 Ci 0 0 The following bounds will hold for the FETI-DP and BDDC methods. each local saddle point problem:      S (i) CiT wi fi = . Remark 4. Remark 4. In applications. On each B (i) . The non-zero entries of the sparse matrix Sc of size q0 can be computed as (Sc )ij = ψ Ti SEE ψ j based on the support of ψ i and ψ j .39. C 0 µj R c ej The components of ψ j will be non-zero only on the boundaries B (l) which intersect the glob associated with the j’th column of Rc . If ej denotes the j’th column of the identity matrix I of size q0 . The columns of matrix Ψ of size nB × q0 can be constructed as follows. only a few local problems need to be solved. MA19]. (4.37. 4. Lemma 4.4 FETI-DP and BDDC Methods 261  T  −1   DT RE SEE C T DT RE S0−1 r≡ RTE DΨ Sc−1 Ψ D RE r + T T r. The entries of the Lagrange multiplier variables enforcing the constraints on the cross points can also be eliminated. Proof. then the j’th column ψ j of Ψ can be computed by solving:      SEE C T ψj 0 = . DO2. The resulting submatrix of S (i) . The following convergence bounds will hold: λmax (S0−1 S) λmax (F0−1 F ) = ≤κ λmin (S0−1 S) λmin (F0−1 F ) and MD T MB w2SEE RE RTE Dw2SEE κ ≤ sup = sup . 0 C 0 0 The block diagonal structure of SEE and C yields its parallel form:  T  −1   p D (i)T R S (i) C T D (i)T R S0−1 = RTE D Ψ Sc−1 Ψ T DT RE + i i i . MA18.57) Ci 0 µi gi can be solved using the Schur complement method.38. Thus. the specified rows of S (i) and CiT and associated columns of S (i) and Ci must be eliminated. See [DO. using the block structure of SEE and C. the entries of wi corresponding to the cross-points on B (i) can be eliminated. w∈W∗ wSEE 2 w∈W∗ w2SEE where κ ≤ c(1 + log2 (h0 /h)) and h0 is the diameter of the subdomains.

parameterize wi in terms of µi using the first block row.262 4 Lagrange Multiplier Based Substructuring: FETI Method will be non-singular (even if S (i) were singular). To solve for the remaining entries of wi and µi . Substituting this expression into the second block row yields: . For notational convenience. This formally yields: −1 wi = S (i) (f i − CiT µi ). we shall denote the resulting saddle point system as in the above.

(i)T AIB ABB (i) wi f i − CiT µi Thus. the solution to (4. See [DO. Instead. wi = S (i) (f i − CiT µi ).57) can be sought by solving two sparse symmetric positive definite systems and one dense symmetric positive definite system of a small size. DO2. MA19] for alternative methods. . MA18. the solution of the system can be obtained by solving the sparse system:  (i) (i)     AII AIB yi 0 = . q0 will be at most eight for rectangular subdomains when Ω ⊂ IR2 or of size twenty-six −1 when Ω ⊂ IR3 . Note that (i) (i)T (i)−1 (i) matrix S (i) = (ABB − AIB AII AIB ) need not be assembled. −1 −1 Ti µi = Ci S (i) f i − gi where Ti ≡ (Ci S (i) CiT ). (i) (i) The Schur complement Ti of size q0 can be assembled explicitly. Once µi is determined.

5. When an algorithm is implemented using multiple processors. We then de- scribe the Kernighan-Lin. we discuss several computational issues that arise with the implementation of domain decomposition algorithms.1 presents background on grid generation and graph theory. We describe graph partitioning algorithms which partition a grid. the number of interior unknowns per subdomain must be approximately the same. Employing a heuristic model of an ideal- ized parallel computer with distributed memory. Some heuristic coarse spaces are also outlined for use on unstructured grids. while the number of boundary unknowns must be minimized to reduce inter-subdomain communication. Under such idealized assumptions.5 Computational Issues and Parallelization In this chapter. and describes how the problem of partitioning a domain or an unstructured grid can be heuristically reduced to a graph partitioning algorithm. The first issue concerns the choice of a decomposition of a domain into non-overlapping or overlapping subdomains. Following that. it is shown that domain decomposition iterative algorithms have reasonable scalability. to ensure load balancing. we describe models for the computational time required for implementing various domain decomposition preconditioners.2 discusses background on the speed up and scalability of al- gorithms on parallel computers. Chap. 5. with subdomains of irregular shapes. Chap. recursive spectral bisection and multilevel graph partitioning algorithms for partitioning graphs. . The second issue concerns the expected parallel computation time and speed up when implementing a do- main decomposition preconditioner on an idealized parallel computer archi- tecture. We outline heuristic estimate for this using idealized models for the computational time and inter-processor data transfer times. we brief dis- cuss the implementation of Schwarz and Schur complement algorithms on unstructured grids.

Chap.1. MA40. so that the number of grid points per subdomain is approximately the same.2. 5.1. and so that the communication time between processors assigned to different subdomains is minimized. OW].7. This issue is typically addressed by employing heuristic graph partitioning algorithms. Chap. is generally a computationally intensive task.1. TH3.1. 5. In such applications.3 describes the construc- tion of subdomains. which require the triangulation and discretization of partial differential equations on regions having complex geometry.1.1 describes grid generation algorithms.1. we discuss several practical techniques for implementing domain decomposition solvers on unstructured grids. 5. the density of grid points and the number of elements incident to each node can vary significantly with location.1 Additionally.4. Comments on Schwarz. Timothy Barth for his kind permission to use Fig. We also discuss the formulation of heuristic coarse spaces for elliptic equations discretized on unstructured grids with subdo- mains having irregular boundaries. HO2. 5.5 to Chap. . We discuss the selection of subdomains so that load balancing constraints are satisfied. see Fig.1. Schur complement and FETI algorithms are presented in Chap. BA22. 5. where traditional coarse spaces are not defined.264 5 Computational Issues and Parallelization 5. 5. 5. As a result. 5. There is Fig.1 Algorithms for Automated Partitioning of Domains The term unstructured grid refers broadly to triangulations without any iden- tifiable structure. An unstructured grid [BA23] 1 The author thanks Dr. 5. while a few coarse spaces are described for unstructured grids in Chap. MA41]. MA40. followed by graph partitioning algorithms in Chap. In this section. Such grids arise in computational fluid dynamics [PE4] and aerodynamics computations [ST6. algorithms are required to automate the partitioning of a domain into subdomains. and lack the connectivity of uniform grids and the hierarchical structure of multigrids.1. HE9.1 Grid Generation Algorithms Generating a triangulation Th (Ω) on a domain Ω in two or three dimensions with complex geometry.1. The resulting grids are typically not quasiuniform. 5.1. MA41. the triangulation is generated by using grid generation software [GE6.

• Decomposition and mapping method. As a result. A structured triangulation of each reference domain is then mapped back to triangulate the original subdo- mains. can be formulated mathematically as a graph partitioning problem. a graph [BO2] (or a weighted graph) can be constructed representing . The algorithm then advances (updates) these fronts by generating new nodes and elements of a desired size within the interior of the domain. In this method. Many Delaunay triangulation algorithms are available. or partitioning an index set of nodes I = {x1 . Readers are referred to [GE6. • Grid based method. Typically. However. however. . and each subregion is mapped onto one or more standard reference regions. TH3. by the decom- position and mapping method) and new nodes are introduced within the interior. OW]. here the domain is decomposed into subregions. MA40. HE9. and the triangulation Th (Ω ∗ ) of Ω ∗ is modified to conform to the boundary ∂Ω. yielding an initial front of the triangulation. it can be of poor quality for numerical approximation. A Delaunay triangulation is a simplicial triangulation (triangles in IR2 or tetrahedra in IR3 ) such that any circum- sphere (i. • Advancing front method. . xn } into subindex sets. . TH3] for literature on unstructured meshes and to [OW] for a survey of software algorithms. the boundary ∂Ω of the do- main is first triangulated (for instance. Given a triangulation Th (Ω) of Ω. in the first phase nodes are placed on the boundary ∂Ω of the domain (for instance. HO2. so that the triangulations must be appropriately modified. Automatic mesh generation software may combine one or more of the above methods and include a phase of refinement or smoothing of the resulting grid. HE9. . we list a few.e. the generated grid may not be quasiuniform or structured. 5.1. a sphere in IR3 or a circle in IR2 passing though the nodes of a tetrahedra or triangle) do not contain other nodes in the interior.1 Algorithms for Automated Partitioning of Domains 265 an extensive literature on algorithms and software for automated generation of grids [GE6.2 Graph Partitioning Algorithms The problem of decomposing a domain Ω into subdomains. 5. The resulting triangulation of Ω will be of low cost. HO2. In this method. Below. The algorithm terminates when the entire domain is triangulated. some based on the computation of Voronoi cells (polyhedral cells consisting of all points in Euclidean space closest to a node). In the second phase.. a Delaunay triangulation Th (Ω) of Ω is constructed using the given distribution of nodes. MA41. • Delaunay triangulation method. MA40. a uniform or structured simplicial or box type grid Th (Ω ∗ ) with a specified grid size h is overlaid on an extended domain Ω ∗ ⊃ Ω. the subdomain triangulations may not match near their boundaries. by the decomposition and mapping method). and adjacent to the current front. using the advancing front method (or alternative methods). One of the earliest methods. MA41. depending on the geometry and specifications for the grid size.

vn } . By default. refers to the number m of edges. A graph G = (V. otherwise. A partition of the domain.1. The number of edges incident to a vertex vi is referred to as the degree of the vertex and will be denoted as d(vi ). Weights may also be assigned to individual vertices vi ∈ V and denoted by w(vi ). In various applications. we introduce the graph partitioning problem. MA30]. vj ) ∈ E (MG )ij = 0. this problem can be formulated as a com- binatorial minimization of an objective functional incorporating the above re- quirements. Thus. E) with weights wij assigned to each edge (vi . while the size of the graph. Formally. if edge ej is incident with vertex vl (NG )lj = 0. . The order of the graph. If the edges in E are enumerated as e1 . E) consists of a collection V of n vertices V = {v1 . vj ) = (vj . vj ) ∈ E. . denoted by |V |. then the vertices incident to each edge may be summarized in an n × m incidence matrix NG :  1. em . it will be useful to assign weights to edges and vertices in a graph. E) by defining wij = wji = 1 if (MG )ij = 1 and w(vi ) = 1. . vi ) ∈ E. A weighted graph is a graph G = (V. Definition 5. may then be obtained by partitioning this associated graph (or weighted graph) into subgraphs. weights can be assigned to any graph G = (V. refers to the number n of vertices. 1. the adjacencies in E may be represented using an n × n symmetric matrix MG referred to as the adjacency matrix. Such graphs are referred to as weighted graphs. Load balancing require- ments can be incorporated by requiring that the subgraphs be approximately of equal size. Here. denoted |E|. · · · . and a collection E of m edges E = {e1 . . and describe three heuristic algorithms for its solution. or of the index set I of nodes in the triangulation. . · · · . Definition 5. if (vi . vj ) ∈ E. em } . if edge el is incident to vertices vi and vj we denote it as el = (vi . its associated combinatorial minimization problem. Such weights can be summarized in an n × n symmetric weight matrix W . while minimization of communication costs can be imposed by requiring that the number of edges cut between subgraphs in the partition is minimized [FA9. if (vi .266 5 Computational Issues and Parallelization the connectivity of the triangulation (either connectivity of elements or of the nodes within the triangulation). Given a graph G of order n. where each edge represents adjacencies between pairs of vertices. The reader is referred to [PO3] for details.2.

.. A graph G = (V. any 2 × 2 block partitioning of matrix P MG P T must yield a nonzero off diagonal block for any permutation matrix P reordering the rows or columns. E) with a nonnegative weight matrix W . In applications to the partitioning of a triangulation. More details of such associations will be described later. i. In this case vertices vi and vj can be defined to be adjacent if nodes xi and xj belong to the same element. If the graph is unweighted. then the default weights wij = (MG )ij for i = j. as shall be described later. i. xl = vj such that all consecutive vertices are adjacent. is referred to as the Fiedler vector of the graph G. For a connected graph G. Due to symmetry and weak diagonal dominance of LG . it will be preferable to identify the vertices vi of the graph with elements κi of triangulation Ωh . In this case. Vertex vi can be defined to be adjacent to vertex vj if elements κi ∩κj = ∅. then the algebraic multiplicity of the zero eigenvalue of LG will yield its number of connected components. and in this case the diagonal entries (LG )ii = d(vi ) will correspond to the degrees of the vertices. should be used. Consequently LG will be singular with eigenvector x1 = (1. (xr .e. By defi- nition. vj ) ∈ E (5. given by the adjacency matrix. • In applications to Schur complement algorithms. a graph G will be connected if and only if its adjacency matrix MG is irreducible. its eigenvalues {λi } will be nonnegative. vj ∈ V there exists a “path” vi = x1 . 5.4. In matrix terms.. l − 1. vj ) ∈ E and i = j. the algebraic multiplicity of the zero eigenvalue of LG will be one and λ2 > 0. Definition 5. Definition 5. We assume that these eigenvalues are ordered as: 0 = λ1 ≤ λ 2 ≤ · · · ≤ λ n . if (vi .1) ⎪ ⎩ 0. · · · . • In applications to Schwarz algorithms. If a graph G is not connected. two alternative graphs G = (V. xr+1 ) ∈ E for r = 1. This can be applied recursively. · · · . The Fiedler vector of a connected graph can be employed to partition a graph into two. the eigenvector x2 of LG corresponding to eigenvalue λ2 > 0: LG x2 = λ2 x2 .e. we define an n × n graph Laplacian matrix LG as follows: ⎧ ⎪ ⎨ l=i wil . 1)T corresponding to eigenvalue λ1 = 0.1 Algorithms for Automated Partitioning of Domains 267 Associated with a graph G = (V.3. · · · . LG will be symmetric and weakly diagonally dominant with zero row sums. E) is said to be connected if for any two vertices vi . if j = i (LG )ij ≡ −wij . if (vi . x2 . E) may be associated with a given triangulation Ωh . let the vertices vi in the graph correspond to nodes xi of Ωh .

e.. E) has been associated with a domain or the nodes in Ωh . Given a graph G = (V. E) and a parameter . respectively so that:  V1 ∪ · · · ∪ Vp = V. · · · . The induced subgraph on the vertices Vi (i. Definition 5. the adjacencies from E between vertices in Vi ) will be required to be connected. if i = j. np . (5. · · · . can be obtained by partitioning the vertices of graph G = (V. as stated formally in the following. Vp of order n1 .268 5 Computational Issues and Parallelization Once a graph G = (V.5. The load balancing constraint can be heuristically approximated by requiring the number ni of nodes within each subset Vi be approximately the same. E) into p subsets V1 . a partition of the domain or its nodes.2) Vi ∩ Vj = ∅.

> 0 chosen by the user. Vp of size n1 . · · · . If ni = |Vi | for i = 1. · · · . if the following hold: 1. respectively. · · · . np . we define K as an admissible partition of V into p sets V1 . p. then: n n (1 − .

) ≤ ni ≤ (1 + .

Then.3) vl ∈Vi This will reduce to the number of vertices in Vi if w(vl ) = 1. The induced subgraphs Gi = (Vi .6. If weighted edges are used. for i = 1. Ei ) are connected. a weight w(vi ) can be assigned to each vertex to denote the number of nodes that vertex vi represents. the number ni of nodes which subset Vi represents should be computed as:  ni = |Vi | = w(vl ). In some applications. If one processor is assigned to each subdomain defined by Vi .). then the vol- ume of communication between the different processors can be heuristically estimated in terms of the total number of edges between the vertices in dif- ferent sets Vi in the partition. where each Ei denotes adjacencies from E between vertices in Vi . it may be convenient to let each vertex in the graph represent more than one nodal unknown in the original triangulation Ωh . p p 2. Remark 5. p. we may define an objective functional δ (·) which represents the sum of edge weights between distinct subsets Vi in the partition. Accordingly. . In such cases. this quantity may be replaced by the sum of the edge weights on edges between different sets Vi . · · · . (5. The requirement that the communication between different subdomains be minimized may thus be approximated by minimizing the sum of such edge weights between distinct subsets Vi .

we define δ (V1 . Vj ) ≡ wrs . Vp ) will represent the volume of communication be- tween subsets in the partition. may formally be approximated by the following combinatorial minimization problem. Vp ) will correspond to the total number of edges between all distinct pairs of subsets Vi and Vj of the partition of V . Vj ) the total sum of edge weights between all pairs of vertices in Vi and Vj :  δ (Vi . . Vp ) ≡ δ (Vi . Vp satisfying: . we denote by δ(Vi . Vp ) as the sum of edge weights between each distinct pair of subsets Vi and Vj :  p−1  p δ (V1 .5) i=1 j=i+1 The functional δ (V1 . · · · . . (5. E) with weight matrix W and two disjoint vertex subsets Vi and Vj of V .4) {vr ∈Vi . · · · . Find a K partition V1 . Vj ) . then δ (V1 . Given a graph G = (V. · · · . If W is chosen by default with wij = (MG )ij . (5.1 Algorithms for Automated Partitioning of Domains 269 Definition 5. . The problem of partitioning a graph G so that the load balancing constraint holds and so that the communication costs between subdomains is minimized. .7.vs ∈Vj } Given three or more disjoint vertex subsets of V . · · · . 5.

the Kernighan-Lin algorithm permits a fixed number q∗ of exchanges within K which increase the value of δ(·). however. . The latter algorithm generally has the lowest complexity amongst the three. . . . Consequently. The following three algorithms will be outlined in the following: the Kernighan-Lin algorithm. · · · . Repeatedly exchange pairs of vertices vi and vj for which the resulting partition is still within K and for which a reduction in the functional δ(·) is obtained. The algorithm must ideally be implemented for several selections of initial partitions.6). If the vertex weights w(vi ) are unitary then such an exchange will leave n1 . this is an NP hard discrete problem. Vp ) = min δ V˜1 . Start with any initial partition V˜1 .··· . V˜p . (5. To avoid stagnation at a local minimum. This algorithm [KE4]. We therefore restrict consideration to heuristic algorithms which approximate the solution to the above. see [PO3]. as with most combinatorial optimization problems.6) ˜ ˜ (V1 . corresponds to a discrete descent method for the combinatorial minimization problem (5. δ (V1 . np unchanged. Kernighan-Lin Algorithm. V˜p in K . and the partition corresponding to the lowest value of δ(·) must .Vp )∈K Unfortunately. · · · . the recursive spectral bisection algorithm and the multilevel graph partitioning algorithm. if nonunitary vertex weights are employed. · · · . no algorithm of polyno- mial complexity is known for determining the exact solution. this constraint must be checked.

4.. Recursive Spectral Bisection Algorithm. if vi ∈ V1 (q)i ≡ −1. Each graph (or subgraph) is partitioned based on sorting the entries of the Fiedler vector of the graph (or subgraph). PO2. i.vj ∈V2 } ⎪ ⎪ ⎩ = 4 δ(V1 . vs ) ∈ E gain(vr . see [PO3]. The complexity of the Kernighan-Lin algorithm is O(n2 log(n)) if a fixed number of iterations is implemented.e. however. then define a vector q as:  1. we obtain: ⎧ T  ⎪ ⎪ q LG q = wij (qi − qj )2 ⎪ ⎪ {(vi . and is relatively expensive to implement due to computation of the Fiedler vector. In this case we seek |V1 | = |V2 | and the partition is referred to as a bisection.270 5 Computational Issues and Parallelization be stored. BA21.6). At most q∗ exchanges resulting in a negative gain should be accepted. FI. vertices which are adjacent to vertices in other sets of the partition. vs ) ∈ E. for any subset V˜i ⊂ V and vertex vr define dV˜i (vr ) as the sum of edge weights wrs between vertex vr and vertices vs in V˜i :  dV˜i (vr ) ≡ wrs . Let LG denote the weighted graph Laplacian matrix (5.1). {(vr . Suppose V1 . The recursive spectral bisec- tion algorithm is a popular graph partitioning algorithm which repeatedly partitions a graph into two subgraphs [SI2. To implement the Kernighan-Lin sweep. . Once a prescribed number of iterations have been completed. The partitions obtained by recur- sive spectral bisection are typically of very good quality as measured by δ (·). BA20. E) with weight matrix W into two subgraphs so that (5. However. this optimal stored partition can be chosen as an approximate solution of (5.6) is minimized. the number of exchanges of vertices per iteration can be reduced significantly if only boundary vertices are exchanged. BO3]. V2 is a solution to the graph bisection problem. then the exchange should be accepted. Remark 5.8. We motivate the spectral bisection algorithm by considering the partition of a graph G = (V.vs )∈E:vs ∈V˜i } Define the gain associated with exchanging vr ∈ V˜i and vs ∈ V˜j as follows: dV˜i (vr ) − dV˜j (vr ) + dV˜j (vs ) − dV˜i (vs ) if (vr . V2 ). if vi ∈ V2 . If the gain is nonnegative. By construction.vj )∈E} ⎨  = wij 4 ⎪ ⎪ {vi ∈V1 . the algorithm is ideally suited for p ≈ 2J for integer J ≥ 1. we suppose that |V | is an even integer and that all vertex weights w(vi ) are unitary. FI2. An O(|E|) complexity algorithm is known for p = 2. vs ) = dV˜i (vr ) − dV˜j (vr ) + dV˜j (vs ) − dV˜i (vs ) − 2wrs if (vr . For simplicity.

√ • Compute the Fiedler vector x2 (having norm n) associated with LG : LG x2 = λ2 x2 . • Since the components (x2 )i may not be in {+1.. sort its entries in increasing order and let α1/2 denote a median value of the entries of x2 . 3. q ˜T 1 = 0 . If (x2 )i = α1/2 define qi = ±1.1 Algorithms for Automated Partitioning of Domains 271  Additionally qT 1 = j qj = 0. where x2 is a Fiedler vector (i. The above algorithm is easily generalized when |V | is not even and when the vertex weights are not unitary. Let G be a connected graph. xT 1 = 0} ⊃ Q. . Suppose the following assumptions hold. an eigenvector of L√G corresponding to eigen- value λ2 > 0) scaled so that its Euclidean norm is n. 2. we may extend the above partitioning by defining V1 as the vertices corresponding to the first n1 components of the Fiedler vector after sorting (taking into account nonunitary weights of vertices).7) {˜ q∈Q} This is called a quadratic assignment problem [PO3]. . 1 ≤ i ≤ n. We may thus state the bisection problem as determining q ∈ Q such that: ˜ T LG q qT LG q = min q ˜. Indeed. Therefore. Lemma 5. Let x2 denote the Fiedler vector of LG . will be equivalent to the minimization of qT LG q for q ∈ Q where: $ % T Q≡ q ˜ = (˜q1 . 1. V2 ) over all admissible partitions.9. • If (x2 )i > α1/2 define qi = +1 and if (x2 )i < α1/2 define qi = −1. q˜n ) : q˜i = ±1. −1}. and it is a discrete (com- binatorial) optimization problem which may be heuristically approximated by a quadratic minimization problem over IRn (with appropriate constraints) as indicated next. We obtain: ⎧ ⎪ min qT LG q ≥ min xT LG x = xT2 LG x2 ⎨ q∈Q xT 2 L G x2 ⎪ = λ2 xT2 x2 ⎩ = λ2 n. Define Q∗ ≡ {x ∈ IRn : xT x = n. . . We may approximate the discrete minimum of qT LG q in Q as follows. For α ≥ 0 and β ≤ 0 define: Iα ≡ {i : (x2 )i ≤ α} Jβ ≡ {i : (x2 )i ≥ −β} . The following theoretical result will hold. (5. for any choice of nonnegative integers n1 and n2 satisfying n1 + n2 = n. 5. so that (n/2) components have +1 entries. minimization of δ(V1 .e. .

E1 ) ≡ G = (V. Weight matrices of subgraphs are defined as submatrices of the parent weight matrix corre- sponding to the indices in the subgraphs. CI7]. For results 1 and 2 see [FI]. CI8. The induced graph associated with V1 = {vi : i ∈ Iα } is connected. For additional details. Spectrally bisect each subgraph Gi at level k into two: $ % (k) (k+1) (k+1) Gi → GI1 (i) . · · · . BA20. GI2 (i) for 1 ≤ i ≤ 2k−1 . The multilevel graph parti- tioning algorithm [SI2. E) 1. The partitioned subgraphs of . Proof. Each graph in the hierarchy is constructed to have approximately half the number of vertices as its parent graph.1. VA3. E) with weight matrix W . a standard graph partitioning algorithm (such as recursive spectral bisection) is applied to partition the coarsest weighted graph by minimizing a suitably defined objective functional equivalent to (5. and employ the notation (k) (k) (k) Gi = (Vi . Given a graph G = (V. Reindex Gi so that indices 1 ≤ i ≤ 2k 4. The induced graph associated with V2 = {vi : i ∈ Jβ } is connected.   The recursive spectral bisection algorithm partitions a graph G = (V.272 5 Computational Issues and Parallelization Then the following results will hold: 1. For k = 1. As mentioned earlier. HA2. though more expensive to compute. KA3. HE7. E) by repeatedly applying the spectral bisection algorithm to each of the sub- graphs obtained from the previous applications of the spectral bisection algorithm. For result 3 see [PO3. by repeated merging (agglomeration) of pairs of vertices within each parent graph. Algorithm 5. KU] is motivated by graph compaction algorithms and multigrid methodology [BR22. CI6]. We summarize the algorithm below. before reindexing.6). (k+1) 3. this graph partitioning algorithm constructs a hierarchy of smaller order or “coarser” graphs G(l) = (V (l) .1 (Recursive Spectral Bisection Algorithm) Let p ≈ 2J denote the number of sets in the partition (1) (1) (1) Define G1 = (V1 . Endfor Here I1 (i) and I2 (i) denote temporary indices for the partitioned graphs. J − 1 do: (k) 2. the Fiedler vector x2 (or an approximation of it) may be computed approximately by the Lanczos algorithm [GO4]. see [PO3. the quality of spectral partitions are very good. E (l) ) with weight matrices W (l) . 3. 2. MC2]. In practice. For any p ∈ Q x2 − q2 ≤ x2 − p2 . Ei ) to denote the i’th subgraph at stage k. Multilevel Graph Partitioning Algorithm. Once a weighted coarse graph of sufficiently small order has been constructed.

e.. . . i. Given a parent graph G(l−1) = V (l−1) . then the default weight matrix W is employed with unitary weights w(vi ) = 1 assigned to the original vertices vi in V . We describe additional details. We shall denote by I1 (i. W (0) ≡ W and n0 ≡ n. A maximal matching is a matching in which no additional edge can be added without violating the matching condition. Since a vertex vi at level l is the agglomeration of (l−1) (l−1) vertices vI1 (i. E) is not weighted. index l = 0 will denote the original and largest order graph in the hierarchy. The new vertices vi in V (l) are assigned weights as follows: (l) (l−1) (l−1) w(l) (vi ) = w(l−1) (vI1 (i. the graphs in the hierarchy will be denoted as G(l) = V (l) . 1. 5. however.l) }.1 Algorithms for Automated Partitioning of Domains 273 the coarse graph are then “projected” onto the next finer level in the hierar- chy by unmerging (deagglomeration) of the merged vertices.l) from V (l−1) . J.e. E (l) ) a matching is any subset of edges from E (l) such that no more than one edge is incident to each vertex.l) ).l) ∪ vI2 (i. from the graph and determine an unmatched (l−1) vertex adjacent to it (if it exists) with maximal edge weight.. E (0) ) ≡ (V. E (l) with weight matrices W (l) of size nl . the computational cost is significantly reduced. then it is left as a singleton and matched with itself. Since the bulk of the computations are implemented on the coarsest graph. we express this as: (l) vi = {vI1 (i. To obtain a maximal matching. Consequently. Definition 5. this procedure is repeated till there are no remaining unmatched vertices. A maximal matching can be constructed in graph G(l−1) as follows. If a vertex is matched with itself (i. This procedure is recursively applied till a partitioning of the orig- inal graph is obtained. vertices in V (l) represent a subset of vertices from the original (l) graph V (0) = V . . l) denotes the index at level (l) (l) (l − 1) of vertex vi . match vr (l−1) (l−1) (l−1) with vs if wrs is largest amongst all the unmatched vertices vs . E (l−1) . Given a graph G(l) = (V (l) . Select (l−1) one vertex randomly. Each graph in the multilevel hierarchy will be indexed as l = 0. l) = I2 (i. For 0 ≤ l ≤ J.l) and vI2 (i. . say vr .l) ) + w(l−1) (vI2 (i. If (l−1) no adjacent unmatched vertex is found for vr . this algorithm defines a coarser (smaller order) graph G(l) = V (l) . is a singleton) then I1 (i. . E). These projected partitions are improved at the finer level by applying a Kernighan-Lin type algorithm. The initial graph will be the original weighted graph G(0) ≡ G with (V (0) . If the original graph G = (V. while index l = J will denote the coarsest and smallest order graph in the hierarchy. In contrast with traditional multilevel notation. l) and I2 (i. l) the indices of the two parent vertices at level (l − 1) which (l) are matched and merged to yield vertex vi at level l.10. E (l) by merging (agglomerating) pairs of vertices within V (l−1) by a procedure referred to as maximal matching.

Vp of vertices in V (l) can be “projected” to yield a partition of V (l−1) .l)} s∈{I1 (j.l) and vj = vI1 (j. We define a projection Pll−1 as: . r∈{I1 (i. Otherwise w(l) (vi ) = w(l−1) (vI1 (i. .l) ).I2 (i. .274 5 Computational Issues and Parallelization (l) (l) (l−1) if vi is not a singleton.l)} The first phase of the multilevel graph partitioning algorithm recursively applies maximal matching to compute coarser graphs till a coarse graph G(J) of sufficiently small order is constructed. The weight (l) (l) w(l) (vi ) will denote the number of vertices of the original graph V (0) in vi . . . (l) (l) Vertices vi and vj in V (l) will be defined to be adjacent in E (l) if any of (l) (l) the parent vertices of vi are adjacent to any parent vertices of vj at level (l) (l) (l) (l − 1). More specifically.l) . Weights and edges are recursively defined by applying the preceding expressions. we discuss how a parti- (l) (l) tion V1 . Before we describe the second phase in the multilevel graph partitioning algorithm. A weight wij will be assigned to adjacent vertices vi and vj by (l) (l) summing the weights on all edges between the parent nodes of vi and vj at level (l − 1).I2 (j.l).l) ∪ vI2 (i.l). we define: (l)   (l−1) wij = wrs . if (l) (l) vi = vI1 (i.l) ∪ vI2 (j.

l) . . (5. . We next describe how to define an induced objective function δ (l) (·) which (l) (l) is equivalent to δ (·) for a partition V1 . Vp at level l: ⎧ . a partition V1 .9) (l) (l) Thus. given indices 0 ≤ r < l. .8) j i More generally. we define a projection Plr recursively: (l) (l) Plr Vi ≡ Pr+1 r · · · Pll−1 Vi . (5.l) ∪ wI2 (j. . Plr Vp . . . . . (l) (l−1) (l−1) Pll−1 Vi ≡ ∪w(l) ∈V (l) wI1 (j. . which will deagglomerate all the vertices (l) (l) vi ∈ Vk . Vp of V (l) will yield a partition of V (r) by use of (l) (l) the projections Plr V1 . Formally.8). . we obtain an expression similar to (5. . .

 ⎨ (l) δ (l) Vi . Vj (l) ≡ {v(l) ∈V (l) . v(l) ∈V (l) } wrs (l) .

 r i s j .

. V (l) . . (5. the preceding objective functionals will satisfy: . V (l) ≡ p−1 p δ (l) V (l) .10) ⎩ δ (l) V (l) . 1 p i=1 j=i+1 i j By construction. . .

.

. by construction. . where δ (0) (·) = δ (·). . if a sequence of partitions are constructed on graph G(l) such that the value of the objective . Vp = δ (r) Plr V1 . (5. (l) (l) (l) (l) δ (l) V1 . Plr Vp . . .11) for 0 ≤ r < l. . . . Hence.

This procedure is recursively applied till a partition is obtained on the finest graph.l). . KA3. Improve the partition employing Kernighan-Lin and δ (l) (·) 9. For additional discussion. to minimize δ (J) (·). .1 Algorithms for Automated Partitioning of Domains 275 functional δ (l) (·) monotonically decreases. PO2]. E (0) ) and J denoting the number of desired levels in the hierarchy. Vp ) 6. Various software implementations of multilevel partitioning algorithms are available. Algorithm 5. indicate that the quality of multilevel partitions are comparable with that obtained by recursive spectral bisection as measured by δ(·). . Vp Numerical studies.I2 (i. . 1 do: (l) (l−1) 7. . . . see [SI2. 3.l)} wrs 4.3 Construction of Subdomain Decomposition Graph partitioning can be applied to either partition Ω into nonoverlapping subdomains Ω1 . Ωp . The resulting partition is then projected to the next finer level using PJJ−1 and refined by an application of several iterations of the Kernighan-Lin algorithm using δ (J−1) (·). HE7. For l = J. . Project: Pll−1 Vi → Vi for i = 1. . 5. . . . . p 8. readers are referred to [PO3]. . a hierarchy of graphs G(1) . We now summarize the multilevel graph partitioning algorithm. . or to partition the index set I of nodes in Th (Ω) into subindex sets I1 . the coarsest graph G(J) is partitioned using an effective graph partitioning algorithm. . Define vertex and edge weights: ⎧ ⎨ w(l) (vi(l) ) = w(l−1) (v (l−1) ) + w(l−1) (v (l−1) ) I1 (i. Ip . Then.l)} s∈{I1 (j. Partition: V (J) → (V1 . Endfor (J) (J) 5. .2 (Multilevel Graph Partitioning Algorithm) 1. so that load balancing and minimal communication . Given G(0) ≡ G. .I2 (j. BA20.1. . . . J do: 2. . . Endfor (0) (0) Output: V1 . · · · . see CHACO [HE8]. such as Kernighan-Lin or recursive spectral bisection. G(J) are constructed by maximal matching. . . . . For l = 1. Construct a coarser graph using maximal matching: V (l) ← V (l−1) E (l) ← E (l−1) .1. METIS [KA3] and [KU]. then the value of δ (0) (·) will also decrease monotonically for the projected partitions at level l = 0.l) ⎩ (l)   (l−1) wij = r∈{I1 (i. 5. The algorithm is summarized next for an input graph G(0) = (V (0) .l) I2 (i.l). .

xn } of vertices in Ωh . Apply any of the partitioning algorithms to minimize δ (·) within K (for a suitable . .276 5 Computational Issues and Parallelization constraints hold. . n where vi is adjacent to vj in E if vertex xi and xj belong to the same element κ ∈ Th (Ω). Define a graph G = (V. vj ) ∈ E. . Assign unitary weights w(vi ) = 1 to the vertices and unitary weights wij ≡ 1 to the edges (vi . . . . To partition the index set I = {x1 . . E) with vertices vi ≡ xi for i = 1. .

E) with vertices vi ≡ κi for i = 1. xj ) ≤ β h0 . Vp . > 0) and partition V into V1 . Remark 5. . . . q. Define a graph G = (V. This yields a partition I1 . . violating load balancing requirements. . vj ) ∈ E. . . . . let κ1 . .11. xj ) denotes the Euclidean distance between xl and xj . κq denote an ordering of the elements in triangulation of Ωh . where vi is adjacent to vj in E if κi ∩ κj = ∅. and if Th (Ω) is not quasiuniform. . then n∗i may vary significantly. for j ∈ Ii } . . for any β > 0 extend each index set Ii as: Ii∗ ≡ {l : dist(xl . We assign unitary vertex weights w(vi ) = 1 and unitary edge weights wij = 1 for (vi . If nj denotes the number of vertices in Ij and n∗j ≥ nj the number of vertices in Ij∗ . We may apply any of the partitioning algorithms to minimize δ (·) within K (for . Ip of the index set I. . To obtain overlap amongst the index sets.12) where dist(xl . . To partition Ω into nonoverlapping subdomains. . (5. .

1. 5. We shall let Vh (Ω) denote the finite element space defined on the unstructured grid Th (Ω). . where dist(κr . Vp . alternative coarse spaces may be employed to provide global transfer of information on such grids [WI6. (5. CH3. SA13]. By construction. The size of Ωi∗ and the associated number of nodes may vary significantly if Th (Ω) is not quasiuniform. .13) Overlap may be included amongst the subdomains by extending each subdo- main Ωi to Ωi∗ by including all elements adjacent within a distance β h0 > 0. . since Th (Ω) is not obtained by the refinement of Th0 (Ω). SA12. CH17. this will yield a partition of Ω into connected subdomains: Ωi ≡ (∪vl ∈Vi κl ). for 1 ≤ i ≤ p. • Piecewise constant discrete harmonic finite element space V0. CA4. corresponding to a subspace of nodal vectors in IRn associated with finite element functions.4 Coarse Spaces on Unstructured Grids Traditional coarse spaces defined on a coarse grid Th0 (Ω) will not be applicable on unstructured grids. SA11. > 0) and partition V into V1 . . We shall outline the following coarse spaces: • Coarse space V0. and formulate coarse spaces either algebraically. .I (Ω) obtained by interpolation of an external space. κj ) denotes the the distance between the centroids of elements κr and κj .P (Ω). Instead.

⎦ (5. (5.13. .12. . To ensure that each coarse node in Th0 (Ω ∗ ) corresponds to a true (nonredundant) degree of freedom.1 Algorithms for Automated Partitioning of Domains 277 Coarse Space Based on Interpolation. If {φ1 (x). . · · · . φn0 (x)} denote n0 basis functions in (0) (0) ∗ Vh0 (Ω∗ ) ⊂ H (Ω ) having desirable properties.I (Ω) 1 is defined as the subspace of Vh (Ω) ∩ H01 (Ω) spanned by interpolants (or projections) of these basis functions onto the finite element space: $ % (0) V0 (Ω) ≡ span Ih φ1 (·). . an n × n0 extension matrix R0T is defined: ⎡ (0) (0) ⎤ φ1 (x1 ) · · · φn0 (x1 ) ⎢ ⎥ R0T ≡ ⎢⎣ . . Ih φ(0) n0 (·) ⊂ Vh (Ω). A matrix representation of V0. If Ω ∗ ⊃ Ω is a polygonal or polyhedral domain covering Ω and triangulated by a quasiuniform grid Th0 (Ω ∗ ). CH17. a tensor product of one dimensional polynomials may be employed. let {φ1 (x). Example 5. see [CA4.. Then. Such a basis was tested in [CA4. Such basis functions will be in H 1 (Ω ∗ ). where R0 will be sparse. CA17]. CH17. ⎥. An alternative coarse space can be constructed by choosing a space Vn0 (Ω ∗ ) of polynomials on Ω ∗ ⊃ Ω and interpolating it onto the (0) (0) finite element space Vh (Ω). xn } denote an ordering of the interior nodes of Th (Ω). xd1 . .14) where Ih denotes a finite element interpolation or projection map onto Vh (Ω). Heuristics .15). . Alternatively. it will be assumed that the support of each nodal basis function defined on Th0 (Ω ∗ ) intersects interior nodes of Th (Ω). · · · . CH3] and shown to yield a quasioptimal convergence rate under appropriate assumptions. Example 5. .I (Ω) can be obtained using the standard interpolation map Ih as follows. may be used. 5. . φn0 (x)} denotes a monomial or Tchebycheff basis for polynomials of degree d or less on a rectangular domain Ω∗ ⊃ Ω. x1 x2 . In two dimensions. x21 . and A0 ≡ R0 AR0T . . CH3. The restriction matrix R0 will be the transpose of the extension matrix. x2 . but a Tchebycheff basis would be preferable. then the matrices R0 . It is more suited for Dirichlet boundary value problems. However. · · · . the monomials: Vd (Ω∗ ) ≡ span {1. . A coarse space can be constructed as in (5. φn0 (x)} (0) (0) ∗ denote a finite element nodal basis defined on triangulation Th0 (Ω ). We indicate two examples below. xd2 }. these matrices will not be sparse. Let I = {x1 .15). The coarse space V0. x22 .. . Let Ω ∗ ⊃ Ω and let {φ1 (x).I (Ω) ⊂ Vh (Ω) is defined by interpolating or projecting an external finite dimensional space Vh0 (Ω∗ ) of functions with desirable approximation prop- erties onto the finite element space Vh (Ω). The finite element coarse space V0. . R0T and A0 can be constructed as in (5. · · · . .15) (0) (0) φ1 (xn ) · · · φn0 (xn ) (0) The functions {φi (·)}ni=1 0 should ideally be chosen so that the above matrix is of full rank. x1 . .

- AII AIB A≡ . ATIB ABB (1) (p) where AII = blockdiag(AII . . Denote by B = ∪pi=1 B (i) the common interface. . A matrix basis for V0 can be constructed as follows [MA14. resulting in the following block structure for A: . . . . AII ). The columns of R0T will be defined as piecewise discrete A-harmonic vectors corresponding to the following p (k) specifically chosen interface data vectors wB for k = 1. with nI and nB denoting the number of nodes in I and B. Ωp denote a nonoverlapping subdomain decomposition of Ω. nB denote the ordering of nodes on B. respectively. Ωp and on interface B. Let Ω1 . . in which case nodes on Ω ∪ BN must also be included in (5. . . SA7]. CO8. . The coarse finite element space V0. . Then. for each node yi ∈ B define NG (yi ) as the number of subdomain boundaries B (k) with yi ∈ B (k) . .278 5 Computational Issues and Parallelization studies indicate reasonable convergence for Neumann boundary value prob- lems [CA18]. · · · . · · · . We shall assume that the indices in I are grouped and ordered as I ∪ B corresponding to the nodes in the subdomains Ω1 . Let yi for i = 1. Employ the block T partitioning w = wTI .15).P (Ω) will consist of finite element functions which are discrete harmonic on each subdomain Ωl with specially chosen boundary values on each B (l) = ∂Ωl \BD . . constructed by graph partitioning of the triangulation Ωh of Ω. p: . wTB as in Schur complement methods.P (Ω) ⊂ Vh (Ω) of piecewise discrete harmonic fi- nite element functions. We next describe a coarse space V0. Coarse Space of Piecewise Discrete Harmonic Functions.

if yi ∈ B (k) . wB · · · wB The coarse finite element space V0. otherwise. Approximation properties of such spaces are described in [CO8.P (B) employed in the balancing domain decomposition preconditioner [MA17]. nB (k) wB = NG (yi ) i 0. . .P (Ω) ⊂ Vh (Ω) will consist of finite element functions whose nodal vectors are in Range R0T . . 1 . MA17]. for i = 1. SA7. of finite element functions in the piecewise constant coarse space V0. . The restriction matrix R0 will be the transpose of R0T and A0 ≡ R0 AR0T .P (Ω) correspond to discrete harmonic extensions into the subdomains. Denote the discrete harmonic extension matrix as E ≡ −A−1 II AIB and define matrix R0T as:   (1) (p) EwB · · · EwB R0 ≡ T (1) (p) . The finite element functions in V0.14. . Remark 5.

the system A u = f may be solved using matrix multiplicative. the local Schur complement S (i) : . 5. .1. . the subdomain boundary segments B (i) will be well defined. Rp . the nonoverlapping subdomains Ω1 . Once a coarse space has be chosen with restriction matrix R0 . . . Il∗ ) = j (Rl )ij = 0. Define an index function with index(i. 5. . . if index(i. Il∗ ) denoting the global index in I of the local index 1 ≤ i ≤ n∗l in Il∗ . so that Neumann-Neumann and balancing domain decomposition precondi- tioners can be applied based on S (i) . CA18] and [CO8. amount of overlap and other factors in unstructured grid applications are presented in [CI8]. Ip∗ have n∗1 . . These algorithms can be formulated as before. If the unstructured grid is quasiuniform. Ωp determined by a graph partitioning algorithm may have complex geometry. optimal convergence should be ob- tained. . . . see [CH3. CA4. faces and wirebaskets may be difficult to identify. . . CH17. Depending on whether c(x) = 0 or c(x) ≥ c0 > 0 in the elliptic equation. and that each index set Il has been extended to Il∗ as described earlier in this section. .5 Schwarz Algorithms We consider next the matrix implementation of Schwarz iterative algorithms on an unstructured grid Th (Ω). n∗p nodes in each set. R1 . based on suitable restriction and extension matrices.6 Schur complement algorithms When the grid is unstructured. The extension matrices RlT will be transposes of the restriction matrices with Al ≡ Rl ARlT . . However. SA7]. Let the subindex sets I1∗ . .1. the entries of an n∗l × n local restriction matrix Rl can be defined by: 1. Ip using a graph partitioning algorithm which minimizes δ (·). We shall assume that the index set I has been partitioned into subindex sets I1 . if index(i.1 Algorithms for Automated Partitioning of Domains 279 5. . . . additive or hybrid Schwarz algorithms based on the restriction matrices R0 . Studies of the effects of partitioning algorithms. . Il∗ ) = j. and traditional globs such as edges. . Then.

the subdomain stiffness matrices A(i) will not be singular. The singular vector will T be zi = (1. MA15]. When c(x) ≥ c0 > 0.P (Ω) described earlier for unstruc- tured grids. Instead. then the convergence rate of the resulting Neumann-Neumann algorithm will deteriorate as h−2 0 (1 + log(h0 /h)) if the grid Th (Ω) is quasi- 2 uniform and the subdomains are shape regular of diameter h0 . · · · . if no mechanism is employed for global transfer of information. may be singular when Ωi is a floating subdomain. (i) (i)T (i)−1 (i) S (i) = ABB − AIB AII AIB . . 1) and the balancing domain decomposition algorithm can be applied. However. see [SA7. the balancing domain decomposition or traditional Neumann-Neumann algorithm can be employed with the coarse space V0.

matrix A(i) will be non-singular.7 FETI Algorithms As with Neumann-Neumann and balancing domain decomposition algorithms. and CG acceleration. Our discussion will be organized as follows. with smooth coefficients a(x) ≥ a0 > 0 and c(x) ≥ 0. with and without coarse space correction. In Chap. We consider representative Schwarz or Schur complement preconditioners. Chap. 5.280 5 Computational Issues and Parallelization 5. 5.4 we employ these bounds to obtain models for the parallel efficiency of various domain decomposition iterative solvers. while if c(x) ≥ c0 > 0.1.2. 5. with and without coarse space correction. 5. Appropriate versions of the FETI algorithm can be employed on unstructured grids [FA15].17) where A = AT > 0 is of size n. (5.2.1 Background Consider the following self adjoint and coercive elliptic equation:  −∇ · (a(x)∇u) + c(x)u = f (x). will yield the linear system: A u = f. on ∂Ω. with grid size h. the subdomain stiffness matrices A(i) will be singular. In Chap. 5. . we describe background on parallel computers and measures for assessing the speed up.16) u = 0.2. 5. in Ω ⊂ IRd (5. If c(x) = 0 and Ωi is floating. under highly idealized assumptions.2. for the execution times of representative domain decomposition solvers imple- mented on a parallel computer having p processors with distributed memory. GR12. FETI algorithms also require minimal geometric information about the sub- domains on unstructured grids.2 Parallelizability of Domain Decomposition Solvers In this section.2. GR16]. efficiency and scal- ability of parallel algorithms. In Chap. SM4. CH15.1 we present background and notation on identities used for the parallel computation of matrix-vector products and inner products. we heuristically model the potential parallel efficiency of do- main decomposition solvers [GR10. and representative Schwarz and Schur complement preconditioners. and derives heuristic estimates for the parallel execution times of represen- tative solvers. FA9. We do this by employing theoretical models.3 describes a domain decomposition strategy for allocating memory and computations to individual processors.2. Its discretization by a finite element method based on a quasiuniform triangulation τh (Ω) of Ω. SK.

Notation. i=1 Local load vectors will be denoted f (i) for 1 ≤ i ≤ ns so that the global load ns (i)T (i) vector has the form f ≡ i=1 R f . If a Schur complement preconditioner is employed. then Ri will denote the pointwise restriction map from nodes on interface B onto the boundary segment B (i) of Ωi .17) by a preconditioned CG algorithm using an additive Schwarz or Neumann-Neumann preconditioner. i=1 A decomposition of the identity on B of the form:  ns I= RTi I (i) Ri .2 Parallelizability of Domain Decomposition Solvers 281 We consider the solution of (5. . Accordingly. the subassembly identity can be expressed in the form:  ns T A= R(i) A(i) R(i) . resulting in subdomain Ωi∗ . The local Schur complements will be denoted S (i) . By construction. Ωns denote a nonoverlapping decomposition of of Ω ⊂ IRd into ns subdomains. we let Ω1 . i=1 where matrix I (i) has the same size as A(i) with nonnegative diagonal en- tries. so that the subassembly identity has the form:  ns S= RTi S (i) Ri . To obtain an overlapping decomposition. the volume (area) of the extended subdomains will satisfy |Ωi∗ | = O ((1 + β∗ ) |Ωi |) for β∗ ≡ (1 + β)d − 1. We will employ the following notation. The pointwise nodal re- striction map onto nodes in Ω i will be denoted R(i) and the local stiffness matrix on Ω i will be denoted A(i) . . . we extend each subdomain Ωi by including all points of Ω within a distance of β h0 from Ωi . We shall assume there exists diagonal matrices I (i) which form a decomposition of the identity:  ns T I= R(i) I (i) R(i) . then each nonoverlapping subdomain Ωi will contain O(n/ns ) unknowns while overlapping subdomains Ωi∗ will contain O ((1 + β∗ )n/ns ) unknowns. Such matrices can be constructed by defining (I (i) )kk = 1 if xk ∈ Ωi and (I (i) )kk = 1/N (xk ) if xk ∈ B (i) where N (xk ) denotes the number of subdomain boundaries to which node xk belongs to. i=1 will also be assumed. . each of diameter h0 and volume (area) |Ωi | = O(hd0 ). Due to quasiuniformity of the underlying triangulation. Consequently. . where I (i) (with some abuse of notation) denotes a diagonal matrix of the same size as S (i) with nonnegative diagonal entries. if n denotes the number of interior nodes in Ω. 5.

This unfortunate fact places constraints on the types of parallel algorithms suitable for implementation on such hardware. and are generally suited for implementation on MIMD architectures. τf τc . and the coarse space matrix will be denoted A0 = R0 AR0T . which we shall assume is zero for simplicity.e. QU8. then the row space of R0 will span the coarse space. multiple data) architecture with distributed memory [HO. For simplicity. it will be spanned by the rows of R0 with S0 = R0 SRT0 denoting the coarse Schur complement matrix. AL3. interprocessor communication must be kept to a minimum to obtain high speed up of algorithms. GR]. then a suitable protocol such as message passing interface [GR15] may be employed. LE16. it will be assumed that data can be communicated directly between any pair of processors (though.2. We shall let Tcomm (n) ≡ τ0 + n τc denote the average time for transferring n units of data between two processors. In such cases. GR].2 Parallel Computation We consider a parallel computer with an MIMD (multiple instruction. AL3. Here τ0 denotes the start up time. The performance of an algorithm on a parallel computer is typically as- sessed by a quantity referred to as the speed up. n) denotes the execution time for implementing a parallel algorithm having problem size n using p processors. By design. If several processors simultaneously send data to each other. The remaining portions typically require communication. Formally. If a coarse space is employed. then its relative speed up is defined as the ratio of its execution time T (1. . LE16. which measures the rate of reduction in its execution time as the number of processors is increased. each with local memory and capable of executing programs independently. as specified by some adjacency matrix). large portions of domain decomposition algorithms involve computations which can be implemented independently without communication. n) on a parallel computer with p processors.282 5 Computational Issues and Parallelization If a coarse space is employed. if T (p. in most domain decomposition applications it will be sufficient to pass data between neighboring processors. either between adjacent subdomains or with a coarse space (if present). Algorithms having relatively large sections of independent computations with relatively small sections requiring communication are said to have coarse granularity. so that Ai = Ri ARiT will be a principal submatrix of A corresponding to nodes in Ωi∗ . see [HO. i. 5. We will assume there are p identical processors. QU8. we let Ri denote the pointwise restric- tion map onto nodes in Ωi∗ . On a typical MIMD parallel computer. the speed of communication τc between processors will be significantly slower than the speed τf of floating point operations. n) on a serial computer to its execution time T (p. where τf denotes the time for a floating point operation. provided each processor is assigned to implement the computations on one or more subdomains. Given overlapping subdomains Ωi∗ ..

n) ≡ . the amount β of overlap (if overlapping subdomains are employed). In finite ele- ment applications. n) = φ(n) τf where φ(n) = c0 nα +o(nα ) for 1 < α ≤ 3. depending on the elliptic equation. T (p. When the speed up is measured relative to the best serial execution time. T (p. Tbest (1. n) ≤ p. Remark 5. Remark 5.17. linear (or almost linear) order complexity may be attained for multigrid and fast Poisson solvers. This is because the relative speed up ratio is not measured with reference to the best serial execution time. The relative speed up of an algorithm implemented using p processors is defined as: T (1. n) When the best serial algorithm or execution time is not known. The execution time T (p. the lowest attainable complexity for the solution of a sparse linear system of size n arising from discretizations of elliptic equations will be denoted φ(n). geometry and discretization.18. Definition 5. there may be other parallel implementations with shorter execution times. Even if the relative speed up of a parallel algorithm attains its maximum value. n) of domain decomposition algorithms may depend on other factors. n) S(p. n) The speed up ratio has a theoretical maximum value of p for a perfectly paral- lelizable algorithm with 1 ≤ S(p. The total speed up of an algorithm is defined as: Tbest (1. 5. n) S(p.15. where Tbest (1. n) denotes the best serial execution time. In special cases.16. the relative speed up may be used as a measure of its parallel performance. This is defined below. n) ≡ . but we will assume Tbest (1.2 Parallelizability of Domain Decomposition Solvers 283 Definition 5. n) = C n τf . the stopping criterion . such as the number ns of subdomains. In such cases. the resulting speed up is referred to as total speed up.

we shall denote the execution time as T (p. size n0 of the coarse space. n. amongst other factors. ns .. β. the complexity φ(·) of the local solver. . If this dependence of the execution time on such ad- ditional factors needs to be emphasized.

n.. β. φ) and the relative speed up as S(p. ns . n0 . .

. ns . . β. n. φ) and the total speed up as S(p. n0 .

n) ≡ × 100%. p T (p. The relative parallel efficiency of an algorithm implemented using p processors is defined as: T (1. n) E(p. we define the parallel efficiency of an algorithm as the percentage of the speed up relative to the maximum speed up of p. n) ≡ × 100%. In the following. n) The total parallel efficiency of an algorithm is defined as: Tbest (1.19. φ). Definition 5. n0 . n) . n) E(p.. p T (p.

Let 0 < α < 1 denote the fraction of computa- tions within an algorithm which are serial in nature. n) = = ≤ . In practice. The parallel execution time given p processors is decomposed as: T (p. T (p. In applications. The fraction α of serial computations within an algorithm can be difficult to estimate and may vary with the problem size n. Empirical evidence indicates that α(n) diminishes with increasing problem size n for most algo- rithms. while B(n) denotes the parallel execution time for the parallelizable portion of the algorithm. n) T (p. This yields the following estimate for the serial execution time of the algorithm: T (1. is a measure of how efficiently an algorithm makes use of additional processors. it is often of interest to know whether parallel algorithms can be found which maintain their efficiency as the size n of the problem is scaled up. T (p. This yields the following upper bound for the speed up: T (1. n) A(n) + B(n) A(n) + B(n) A(n) + B(n) Unlike the fixed bound given by Amdahl’s law. A less pessimistic upper bound for the maximum speed up was derived by Gustafson-Barris as indicated below. n) = A(n) + B(n). n)/p. n) = A(n) + p B(n) from which we estimate the speed up as:     T (1. n) 1 1 S(p. the following estimate can be obtained for the optimal execution times: T (1. and ignoring over- head and communication costs. The scalability of a parallel algorithm. defined below. regardless of the computer hardware. n) + (1 − α) T (1. n) = α T (1. due to portions of the algorithm in which computations can only be executed sequen- tially. Such an upper bound on the speed up is given by Amdahl’s law. n) α + (1 − α)/p α Thus. n) A(n) + p B(n) A(n) B(n) S(p. there may be constraints on the maximal speed up attainable in an algorithm. n) = = = + p. . n) = α T (1. Amdahl’s law yields a pes- simistic bound in practice. n) + (1 − α) T (1. assuming perfect parallelizability of the remaining portion of the algorithm.284 5 Computational Issues and Parallelization Amdahl’s Law. due to the implicit assumption that the fraction α of serial computations remains fixed independent of n. the Gustafson-Baris bound for the speed up increases linearly with the number of processors. where A(n) denotes the execution time for the serial portion of the algorithm. Then. which may be derived as follows. the parallel speed up of an algorithm cannot exceed the inverse of the fraction α of serial computations within the algorithm.

5.2. an algorithm is scalable if given m p processors where m > 1. three alternative approaches may be employed for solving a coarse space problem in parallel in domain decomposition preconditioners: • Parallelize the solution of the coarse problem (using all the processors) and store relevant data on each processor. n). n): T (1.20. n(m))/(m T (1. An algorithm is said to be perfectly scalable if its efficiency remains constant when the problem size n and the number of processors p are increased by the same factor m: E(m p. An algorithm is said to be scalable if it is possible to keep its efficiency constant by increasing the problem size as the number of processors increases. n). • Gather all the relevant coarse data on a specific processor and solve the coarse problem only on this processor. in relation to T (p. n(m)) = ( ) T (p. . and a portion computing the action of the inverse of the preconditioner. n). it is easily seen that the fol- lowing will hold for an algorithm satisfying E(m p. n(m)) = E(p. update of residuals. m n) = E(p. n)) is the factor by which the com- putation time is increased or decreased. Using the definition of scalability. An algorithm is said to be highly scalable if its parallel efficiency depends only weakly on the number of processors as the problem size n and the number p of processors are increased by the same factor. the expression T (1. Fur- thermore. solve the coarse problem redun- dantly in parallel on each processor. the problem size can be increased to n(m) > n such that: E(m p. n). thereby minimizing communication of additional data. it will be desirable to allo- cate memory to individual processors in a way compatible with both sections of the algorithm. iterates and inner products). m T (1. n(m)) = E(p. n(m)) T (m p. • Gather the coarse data on each processor. When implementing a PCG algorithm on a parallel computer with distributed memory. a portion not involving the preconditioner (matrix-vector products.2 Parallelizability of Domain Decomposition Solvers 285 Definition 5.3 Parallelization of PCG Algorithms Each iteration in a PCG algorithm can be decomposed into two portions. 5. Remark 5. and broadcast the result to all other processors. if coarse space correction is employed within the preconditioner.21. care must exercised in the parallel implementation of the coarse problem. More specifically. as the number of processors is increased to m p and the problem size is increased to n(m). Typically. n) Here.

286 5 Computational Issues and Parallelization

Generally, the latter two approaches are preferable on typical parallel archi-
tectures [GR10], though we shall consider only the second approach.
Motivated by the preceding, we shall heuristically consider the following
strategy for allocating memory and computations to individual processors.
• Each of the p processors is assigned to handle all the computations corre-
sponding to one or more subdomains or a coarse problem. Thus, if a coarse
space is not employed, each processor will be assigned to handle (ns /p)
subdomains, and (ns /p) + 1 subproblems if a coarse space is employed.
• To ensure approximate load balancing, we shall require the number of
unknowns O(n/ns ) per nonoverlapping subdomain (or O ((1 + β∗ )n/ns )
per overlapping subdomain) to be approximately equal. If a coarse space
is employed, we shall additionally require the number n0 of coarse space
unknowns not to exceed the number of unknowns per subdomain, yielding
the constraint n0 ≤ C(n/ns ).
• To reduce communication between the processors, we shall assume that
the subdomain data are distributed amongst the different processors as
follows. The processor which handles subdomain Ωi should ideally store
the current approximation of the local solution u(i) on Ω i , the local stiff-
ness matrix A(i) , local load vector f (i) and matrix I (i) . If overlapping
subdomains Ωi∗ are used, then the local solution ui on Ωi∗ , submatrix
Ai = Ri ARiT , local load Ri f , local residual Ri r and the components Ri RjT
for adjacent subdomains should also be stored locally. If a coarse space
T
is employed, then the nonzero rows of R0 R(i) and R0 RiT should also be
stored locally.
• The processor which handles the coarse space should also store matrix
A0 = R0 AR0T and the nonzero entries of Rj R0T for 1 ≤ j ≤ ns .
We shall let K denote the maximum number of adjacent subdomains.
When deriving theoretical estimates of execution times, we shall as-
sume that an efficient sparse matrix solver having complexity φ(m) = c0 mα +
o(mα ) for some 1 < α ≤ 3 is employed to solve all the subproblems of size m
occurring within a domain decomposition preconditioner. Analysis in [CH15]
suggests that if a serial computer is employed, then the optimal diameter h0
of a traditional coarse grid must satisfy:

h0 = O hα/(2 α−d) for Ω ⊂ IRd .

If a parallel computer is employed with p processors, then load balancing
requires the number n0 of coarse space unknowns to satisfy n0 ≤ c(n/ns ).
Since theoretical analysis indicates a coarse space must satisfy an approxi-
mation property of order h0 for optimal or almost optimal convergence, this
heuristically suggests n0 ≈ ns ≈ n1/2 for traditional coarse spaces.
In the following, we outline parallel algorithms for evaluating matrix
multiplication and inner products, and the action of additive Schwarz and

5.2 Parallelizability of Domain Decomposition Solvers 287

Neumann-Neumann preconditioners. We derive heuristic estimates for the
parallel execution times of the resulting algorithms.
Parallelization of Matrix Vector Products. By assumption,we let a vec-
tor w be distributed amongst different processors with component R(i) w (and
Ri w, if overlapping subdomains are employed) stored on the processor han-
dling Ωi . As a result, a matrix-vector product A w can be computed using
the subassembly identity:

ns
T
Aw = R(i) A(i) R(i) w,
i=1

and the result can be stored locally using the following steps.
1. In parallel, multiply each of the local vectors R(i) w (assumed to be stored
locally) using the local stiffness matrix A(i) .
T

2. The processor handling Ωi should send the data R(j) R(i) A(i) R(i) w to
the processor handling Ωj .
3. The processor handling Ωj should sum the contributions it receives:

ns
T
R(j) Aw = R(j) R(i) A(i) R(i) w,
i=1

from all (at most K) neighbors, and store the result locally.
If ti denotes the parallel execution time for the i’th step above, it will satisfy:

⎨ t1 ≤ c1 (ns /p) (n/ns ) τf

t2 ≤ c2 (ns /p) K (n/ns )(d−1)/d τc + τ0


t3 ≤ c3 (ns /p) K (n/ns )(d−1)/d τf .
Apart from τ0 , the other terms are inversely proportion to p.
Matrix-vector products involving the Schur complement matrix S can be
computed similarly, based on an analogous subassembly identity:

ns
SwB = RTi S (i) Ri wB .
i=1
T −1
(i) (i) (i) (i)
Since S (i) = AII − AIB AII AIB , such computations require the solution
of local linear systems, with the solver of complexity φ(.). Thus, the parallel
execution time for matrix multiplication by S will be bounded by a sum
of t1 = c1 (ns /p) φ(n/ns ) τf , t2 = c2 (ns /p) K (n/ns )(d−1)/d τc + τ0 and also
t3 = c3 K (ns /p) (n/ns )(d−1)/d τf . Again, apart from the start up time τ0 , the
other terms are inversely proportion to p.
Parallelization of Inner Products. Inner products can be computed in
parallel based on the distributed data stored on each processor. By assump-
tion, given vectors w and v, their components R(i) w and R(i) v will be stored

288 5 Computational Issues and Parallelization

on the processor handling Ωi . Since matrix I (i) will also be stored locally, the
inner product wT v can be computed using the identity:


ns
T
wT v = wT R(i) I (i) R(i) v.
i=1

This computation may be distributed as follows.
1. In parallel, the processor handling Ωi should compute the local inner
T
products wT R(i) I (i) R(i) v.
2. Each processor should sum the (ns /p) local inner products it handles and
communicate the computed result to all the other processors.
3. Each processor should sum all the local inner products it receives and
store the resulting answer locally.
If ti denotes the execution time for the i’th step above, it will satisfy:

⎨ t1 ≤ c1 (ns /p) (n/ns ) τf

t2 ≤ c2 (ns /p) τf + c3 p τc + τ0


t3 ≤ c4 p τf .

Except for c3 p τc + τ0 and c4 p τf , the other terms vary inversely with p.
Analogous estimates will hold for inner products in Schur complement
algorithms, based on interface unknowns. The total execution time in this
case will be bounded by the sum of t1 = c1 (ns /p) (n/ns )(d−1)/d τf along with
t2 = c2 (ns /p) τf + c3 p τc + τ0 and t3 = c4 p τf . Except for c3 p τc + τ0 and
c4 p τf , the other terms are inversely proportion to p.
Parallelization of an Additive Schwarz Preconditioner. If there is no
coarse space, the inverse of such a preconditioner will have the form:


ns
M −1
= RiT A−1
i Ri .
i=1

Computation of the action of M −1 on a residual vector r can be implemented
in parallel as follows.
1. In parallel, solve Ai wi = Ri r using the locally stored residual vector Ri r
and the locally stored submatrix Ai .
2. In parallel, the processor handling Ωi∗ should send Rj RiT wi to each of the
processors handling Ωj∗ for Ωj∗ ∩ Ωi∗ = ∅.
3. In parallel, each processor should sum contributions
ns of solutions from
adjacent subdomains and store Rj M −1 r = i=1 Rj RiT wi locally.
The computational time for each step can be estimated.

5.2 Parallelizability of Domain Decomposition Solvers 289

If ti denotes the execution time for the i’th step, it will satisfy:

⎨ t1 ≤ c1 (ns /p) φ ((1 + β∗ )(n/ns )) τf

t2 ≤ c2 K β∗ (n/p) τc + τ0


t3 ≤ c3 K β∗ (n/p)τf .

Apart from τ0 , the terms are inversely proportional to p.
If a coarse space is included, the preconditioner will have the form:

ns
M −1 = RiT A−1 T −1
i Ri + R0 A 0 R0 .
i=1

Care must be exercised when parallelizing the coarse grid correction term
R0T A−1
0 R0 since the computation of R0 r requires global communication be-
tween processors. We shall assume that the coarse space computations are
performed on a processor assigned to the coarse space, however, they may
alternatively be performed redundantly on each of the other processors in
parallel. We shall not consider the parallelization of coarse space computa-
T
tions. By assumption, the nonzero rows of R0 R(i) , matrix I (i) and vector
R(i) r are stored locally on the processor handling Ωi∗ . Thus, the vector R0 r
may be computed based on the following expression:

⎨ R0 = R0 ns R(i)T I (i) R(i)
ns

i=1

⎩ = i=1 R0 R(i)
T
I (i) R(i) .

Below, we summarize an algorithm for the parallel computation of M −1 r.
1. The processor handling Ωi∗ should compute the nontrivial rows of the term
T
R0 R(i) I (i) R(i) r using the locally stored vector R(i) r and matrix I (i) . Send
these nontrivial rows to the processor handling coarse space correction.
The processor handling the coarse space should sum the components:

ns
T
R0 r ≡ R0 R(i) I (i) R(i) r.
i=1

2. In parallel, solve Ai wi = Ri r for 0 ≤ i ≤ ns .
3. If Ωi∗ ∩Ωj∗ = ∅ then the processor handling Ωi∗ should send Rj RiT wi to the
processor handling Ωj∗ . The processor handling the coarse space should
send relevant components of R0T w0 to the processor handling Ωi∗ .
4. In parallel, the processor handling Ωi∗ should sum the components:

ns
Ri M −1 r ≡ Ri RjT wj .
j=0

The computational time for each step above can be estimated.

290 5 Computational Issues and Parallelization

If ti denotes the execution time for the i’th step above, it will satisfy:


⎪ t1 ≤ c1 K (1 + β∗ )(n/p) τf + c2 K n0 τc + τ0 + c3 K n0 τf


⎨ t2 ≤ c4 (ns +1) φ ((1 + β∗ )(n/ns )) τf
p
⎪ t3 ≤ c5 K (1 + β∗ ) (n/p) τc + τ0



⎩ t ≤ c (ns +1) (K + 1) (1 + β ) (n/n ) τ ,
4 6 p ∗ s f

provided that n0 ≤ (1 + β∗ )(n/ns ). Additionally, if ns scales proportionally
to p, then apart from τ0 , the other terms are inversely proportional to p.
Parallelization of the Neumann-Neumann Preconditioner. We next
consider a Neumann-Neumann Schur complement preconditioner, in which
the action of the inverse of the preconditioner has the form:

ns

M −1 = RTi S (i) Ri + RT0 S0−1 R0 ,
i=1

where S0 = R0 SRT0  A0 . Care must be exercised when parallelizing the
computation of RT0 S0−1 R0 rB , since it requires global communication. It will be
assumed that the nonzero rows of R0 RTi are stored on the processor handling
Ωi . The action of R0 on rB can be computed using the identity:

ns
R0 = R0 i=1 Ri I Ri
T (i)
ns


= i=1 R0 RTi I (i) Ri .

Below, we list the implementation of the Neumann-Neumann preconditioner.
1. In parallel, each processor handling Ωi∗ should compute the nontrivial rows
of R0 RTi I (i) Ri rB using (the locally stored) Ri rB and matrix I (i) . Send
these nontrivial rows to the processor handling coarse space correction
and then sum the components to obtain:

ns
R0 rB ≡ R0 RTi I (i) Ri rB .
i=1

2. In parallel, solve S (i) wi = Ri rB for 0 ≤ i ≤ ns where S (0) ≡ S0 .
3. In parallel, if Ωi∗ ∩Ωj∗ = ∅, the processor handling Ωi∗ should send Rj RTi wi
to the processor handling Ωj∗ . The processor handling the coarse space
should send Ri RT0 w0 to the processor handling Ωi∗ for 1 ≤ i ≤ ns .
4. In parallel, the processor handling Ωi∗ should sum the components:


ns
−1
Ri M rB ≡ Ri RTj wj .
j=0

The computation times for the above steps can be estimated.

5.2 Parallelizability of Domain Decomposition Solvers 291

If ti denotes the execution time for the i’th step above, it will satisfy:


⎪ t1 ≤ c1 (nsp+1) K (n/ns )(d−1)/d τf + c2 K n0 τc + τ0 + c3 K n0 τf


⎨ t ≤ c (ns +1) φ (n/n ) τ
2 4 p s f
⎪ t3 ≤ c5 K (ns /p) (n/ns )(d−1)/d τc + τ0




t4 ≤ c6 (nsp+1) (K + 1) (n/ns )(d−1)/d τf ,

provided that n0 = O(n/ns ). If ns is proportional to p, then apart from τ0 ,
the other terms vary inversely with p.

5.2.4 Estimation of the Total Execution Times

Using the preceding estimates, we may estimate the execution time T (p, n,

)
of CG algorithms for different choices of preconditioners. Here T (p, n,

) is the
total execution time for implementing a PCG algorithm to solve a problem
of size n, on a p processor parallel computer, where the initial residual is
reduced by a factor

. The total execution time will be the product of the
number N (n,

) of iterations required to reduce the residual by the factor

,
and the parallel execution time T∗ (p, n) per iteration:

T (p, n,

) = N (n,

n). (5.18) We shall suppress dependence on .) T∗ (p.

n) = φ(n) τf ≤ c0 nα τf . n) and G∗ (p. n) denotes the execution time per iteration for the remaining computations (matrix-vector products. n) per iteration can be further decomposed as: T∗ (p. • We assume that the best serial execution time satisfies: Tbest (1. while H∗ (p. n) + H∗ (p. we heuristically es- timate the parallel efficiency of the additive Schwarz and Neumann-Neumann PCG algorithms. n) can be obtained by summing up the relevant execution time estimates ti for appropriately chosen routines from the preceding pages. d. • We assume that: τ0 = 0. Estimates for H∗ (p. inner products. p2 ≤ n. n) denotes the execution time per iteration of the preconditioning step. α and γc = (τc /τf )  1. making several simplifying assumptions.19) where G∗ (p. (5. vector addition). n) = G∗ (p. • We omit lower order terms in expressions. p ≤ ns . . n). The execution time T∗ (p. We shall express the efficiency in terms of n. p. for convenience. n0 ≤ (1 + β∗ )(n/ns ). Employing the total execution times.

n) can be obtained similarly by summing the ti from the preceding section for the additive Schwarz preconditioner without a coarse space. and omit all lower order terms. ⎧ ⎪ ⎪ H∗ (p.20) ⎪ ⎪ G∗ (p. n) ≤ c0 e1 (ns /p) (1 + β∗ )α (n/ns )α τf ⎪ ⎪ ⎩ +e2 K β∗ (n/p) τc + e3 K β∗ (n/p) τf .292 5 Computational Issues and Parallelization Additive Schwarz Preconditioner Without Coarse Space. p2 ≤ n and p ≤ ns . We assume that τ0 = 0. . 2 for the condition number of the additive Schwarz PCG algorithm without coarse space correction yields: cond(M. h0 . n) to solve Au = f using a CG algorithm can be obtained by sum- ming the appropriately chosen quantities ti from the preceding section for matrix-vector products and inner products routines. Estimates of H∗ (p. Estimates of G∗ (p. Standard estimates for error reduction in PCG algorithms [GO4] yields: N (n. n) ≤ d1 (n/p) τf + d2 K (n/p) τc ⎪ ⎪ ⎨ +d3 (ns /p) K (n/ns )(d−1)/d τf (5. A) ≤ C(β) h−2 0 . Bounds from Chap.

. β) ≤ C(.

for some C(. β) h−1 0 ..

n). and retaining only the highest order terms and substituting h−1 1/d 0 = O(ns ). yields: .. n) and G∗ (p. β) independent of n. ns and p. Summing H∗ (p. −d (which holds since ns = O(|Ω| h0 )).

ns ) yields the following heuristic bound for the total efficiency when p = ns ≤ n1/2 . τ0 = 0 and 1 < α ≤ 3: ! nα E(p. Thus the above algorithm is scalable. as p is varied. n) = c0 nα τf along with the pre- ceding bound for T (p. ns . it may be noted that the value of n can be increased to maintain a constant efficiency. Heuristically. Estimates of G∗ (p. n. the value of p which minimizes the denominator will optimize the efficiency. ns ) ≤ c0 n1/d s C1 γc (n/p) + C2 (ns /p) (n/ns )α + C3 (n/p)(d−1)/d τf where γc ≡ (τc /τf )  1. Here Ci may depend on all parameters excluding n. n) ≥ . p(d+1)/d C1 γc (n/p) + C2 (n/p)α + C3 (n/p)(d−1)/d By considering only the leading order terms as p increases. Additive Schwarz Preconditioner with Coarse Space. Substituting that Tbest (1. p and γc . n) and H∗ (p. n) can be obtained for the additive Schwarz preconditioner with coarse space correction by summing the appropriate ti : . T (p. for a fixed n. n.

n) ≤ c0 e1 (ns /p) (1 + β∗ )α (n/ns )α τf ⎪ ⎪ ⎩ + e2 K β∗ (n/p) τc where lower order terms and the start up time τ0 have been omitted. 2 yield the following estimate for the condition number of the additive Schwarz PCG algorithm with coarse space correction: cond(M. n) ≤ d1 (n/p) τf + d2 K (n/p) τc ⎪ ⎪ ⎨ + d3 (ns /p) K (n/ns )(d−1)/d τf (5. Bounds from Chap. 5.2 Parallelizability of Domain Decomposition Solvers 293 ⎧ ⎪ ⎪ H∗ (p. A) ≤ C(β).21) ⎪ ⎪ G∗ (p. . Standard estimates for error reduction in PCG algorithms [GO4] yields: N (n. h0 .

β) ≤ C(..

where C(. β)..

ns ) yields a bound for E(p. Neumann-Neumann Preconditioner for the Schur Complement. n) and H∗ (p. By considering only the leading order terms. n. n. ns . p (C1 γc (n/p) + C2 (n/p)α ) The above bound is an improvement over the efficiency of the additive Schwarz algorithm without coarse space correction.22) ⎪ ⎩ (d−1)/d + e2 K (ns /p) (n/ns ) τc . the efficiency can be maintained. Bounds from Chap. p and ns . Heuristically. n) ≤ e1 K (ns /p) φ(n/ns ) τf (5.. n) can be estimated for the Schur complement algo- rithm with Neumann-Neumann preconditioner by summing relevant estimates ti for routines described in the preceding section: ⎧ ⎪ H (p. n) ≥ . n) ≤ d1 (ns /p) φ(n/ns ) τf + d2 K (n/p) τc ⎨ ∗ G∗ (p. 3 yield the following condition number estimate: 2 cond(M. n) and retaining only the highest order terms in φ(·) yields the following bound: T (p. and Ci may depend on all parameters excluding n. . β) is independent of n. ns ) ≤ c0 (C1 γc (n/p) + C2 (ns /p) (n/ns )α ) τf where γc ≡ (τc /τf )  1. the value of p which minimizes the denominator optimizes the efficiency. it is seen that as p is increased. Here. Summing H∗ (p. A) ≤ C (1 + log(h0 /h)) . n) when p = ns ≤ n1/2 and τ0 = 0:   nα E(p. this algorithm is scalable. Substituting Tbest (1. n) = c0 nα τf and the preceding bounds for T (p. The terms G∗ (p. n) and G∗ (p. lower order terms and the start up time τ0 have been omitted. Thus.

h0 .294 5 Computational Issues and Parallelization for the Neumann-Neumann algorithm with coarse space correction. Bounds for the error reduction of PCG algorithms [GO4] yields: N (n. .

β) ≤ C(..

) (1 + log(h0 /h)) . where C(.

GR16] for additional discus- sion on the parallel implementation of domain decomposition algorithms. An intermediate value of p will optimize the efficiency. n) and G∗ (p. it is seen that as p is increased. though not perfectly scalable.22. a value of n can be determined so that the efficiency in maintained. ns . GR12. ns ): T (p. Thus. Since by assumption h−1 1/d 0 = O(ns ). ns ) yields the following lower bound for the total efficiency when p = ns ≤ n1/2 and τ0 = 0: ! nα E(p. ns ) ≤ c0 log(n/ns ) C1 (ns /p) (n/ns )α + C2 (n/p)(d−1)/d γc τf + c0 log(n/ns ) (C3 (n/p) γc ) τf . The preceding discussion shows that the representative domain decomposition solvers are scalable. this algorithm is scalable. n).) is independent of n. SK. Readers are referred to [GR10. where γc = (τc /τf )  1. n. n. SM4. n. Summing the terms H∗ (p. Substituting the estimate Tbest (1. Remark 5. it follows that log(h0 /h) = O(d−1 log(n/ns )). substituting log(h0 /h) = O(d−1 log(n/ns )) and retaining only the highest order terms in φ(·). n) = c0 nα τf and using the preceding bound for T (p. n) ≥ . CH15. yields the following bound for T (p. FA9. . p log(n/ns ) C1 (n/p)α + C2 (n/p)(d−1)/d γc + C3 (n/p) γc By considering only leading order terms.

One of the algorithms elaborates an algorithm from Chap. and solve it using a CG algorithm. since the iterative algorithms based on the Schur complement. Although saddle point methodology may also be employed to solve this least squares-control problem. subject to the constraint that the local unknowns solve the elliptic equation on each subdomain. on ∂Ω.1) employs unknown functions on each subdomain.5. AT. We denote a finite element discretization of (6. where c(x) ≥ 0.2 considers two non-overlapping subdomains.2) where A = AT > 0 is the stiffness matrix of size n and b ∈ IRn . 1. we consider a decomposition of Ω into two overlapping subdomains. GL. for simplicity we shall describe a matrix formulation for the following self adjoint elliptic equation:  L u ≡ −∇ · (a(x)∇u) + c(x)u = f. with unknown boundary data that serve as control data. we describe iterative algorithms formulated based on the least squares-control theory framework [LI2. 6.3. These unknowns solve the partial differential equation on each subdomain. Given a decomposition of Ω into two or more subdomains. This problem can be formulated mathematically as a constrained minimization problem.1. Schwarz and Lagrange multiplier formulations are more extensively studied.6 Least Squares-Control Theory: Iterative Algorithms In this chapter. . we reduce it to an unconstrained minimization problem. 6. The methodology ap- plies to non-self adjoint elliptic equations. a least squares- control formulation of (6. in Ω (6.1) u = 0.1) as: Au = f (6. The control data must be chosen so that the subdomain solutions match with neighbors to yield a global solution to (6. Some extensions to multiple subdomains are discussed in Chap. GU2]. however.1). Our discussion is heuristic and described for its intrinsic interest. which seeks to minimize the difference between the local unknowns on the regions of overlap. while Chap. In Chap. 6.

Remark 6.) ≡ u(.5) ⎪ ⎩ vi = 0. By construction. u2 ) to (6. which minimizes the functional within V∗ : J(u1 .) defined on the subdomains Ω1∗ and Ω2∗ . 6. The least squares- control formulation of (6.1) seeks local functions u1 (. Our focus will be on the iterative solution of system (6.1) exists.v2 )∈V∗ where V∗ consists of v1 (.1 Two Overlapping Subdomains In this section.2).296 6 Least Squares-Control Theory: Iterative Algorithms Regular decomposition Immersed decomposition  Ω2∗ -  Ω1∗ -  Ω2∗ -  Ω∗ - 1 B (2) Ω12 ∗ B (1) Fig. Here gi denotes the unknown local Dirichlet data. we consider two subdomains Ω1∗ and Ω2∗ which form an overlapping decomposition of Ω with ∗ sufficient overlap.1.1. Accordingly. as in Fig. 6. u2 ) = min J(v1 .) and v2 (.4) and (6. In this case.) = u2 (. We shall employ the following functional in the overlapping case: J(v1 . the Dirichlet boundary data gi (.) on Ωi∗ for i = 1. respectively. on B[i] .1.) on Ωi∗ . if the global solution u to (6.) solving: ⎧ ∗ ⎪ ⎨ L vi = f. v2 ) ≡ v1 − v2 2α.Ω12 with minimum value zero.1) based on a decomposition of Ω into two overlapping subdomains [AT]. then its restriction ui (. then it can easily be verified that ui (.) and u2 (.5) satisfies u1 (. Two overlapping subdomains 6.4). We define Ω12 ≡ Ω1∗ ∩ Ω2∗ as the region of overlap between the two subdomains.) are defined on Ω1∗ and Ω2∗ . 2 will minimize u1 − u2 2α. If the solution (u1 . on B (i) for i = 1. 2 (6.3) where v1 (.) on B (i) can be regarded as control data which needs to be determined in order to minimize the square norm error term (6. we describe a least squares-control formulation of (6. .) on ∗ Ω12 . for 0 ≤ α ≤ 1.Ω12 ∗ be the fractional Sobolev norm H (Ω α 12 ) on Ω12 . (6.4) (v1 .) and v2 (. ∗ ∗ Let  · α. in Ωi vi = gi .Ω12 ∗ . We define B (i) = ∂Ωi ∩Ω as the internal boundary of each subdomain and B[i] = ∂Ωi ∩ ∂Ω as its external boundary.) will match the true solution u(. v2 ) (6.

Let nI = (n11 + nB (2) + n12 ) be the number of nodes in Ω1∗ and nI = (n12 + nB + n22 ) the number of nodes in Ω2∗ . v(2) as: J v(1) . we let v(i) = (vI . . i. The indices of nodes in Ω1∗ will be I (1) = I11 ∪ B (2) ∪ I12 and I (2) = I12 ∪ B (1) ∪ I22 in Ω2∗ .6) ⎪ ⎪ ⎪ ⎪ (1) B = indices of nodes in B (1) ⎪ ⎪ ⎩ ∗ I22 = indices of nodes in Ω2∗ \ Ω 1 . A discrete version of the least squares-control problem (6. respectively.e.Ω12 ∗ = 2 R12 v 1 (1) − R21 v(2) 2Aα T (6. n12 . B (2) . v2 ) are finite element functions defined on (Ω1∗ . we let R12 denote an n12 × n1 restriction matrix which maps a nodal ∗ ∗ vector on Ω 1 into its subvector of nodal values on Ω 12 . Given the original load vector (i) (i) f ∈ IRn . B (1) and I22 . v . if (v1 . we define a restriction matrix R21 as an n12 × n2 matrix mapping a vector of nodal values ∗ ∗ on Ω 2 into its subvector of nodal values on Ω 12 . nB .5) can now be obtained by discretizing the square norm functional and constraints. vB )T ∈ IRni denote the vector of its interior and boundary nodal values.4) and (6.1 Two Overlapping Subdomains 297 We shall formulate a matrix version of the above least squares-control formulation using the following notation. v(2) )T ∈ IRnE . (i) (i) We let AII denote a submatrix of A of size nI corresponding to the indices in I (i) . Given the ordering B (2) ∪ I12 ∪ B (1) of nodes in ∗ Ω 12 . Accordingly. 6. B (2) . We shall order all the nodes in Ω ∗ and partition them based on the subregions Ω1∗ \ Ω 2 . Ω12 ∗ . Similarly. representing coupling between interior nodes in Ωi∗ . (1) I12 . (i) (i) (i) we let AIB denote an nI ×nB submatrix of A representing coupling between nodes in I and B .7) = 12 R12 v(1) − R21 v(2) Aα R12 v(1) − R21 v(2) . we define J v . Similarly. nB and n22 denote the number of indices in I11 . (i) (i) A global extended vector consisting of the local subdomain nodal vectors T T will be denoted vE = (v(1) . (2) (1) Let n11 . For 0 ≤ α ≤ 1 we let Aα denote a symmetric positive definite matrix of size n12 representing the finite ∗ ∗ element discretization of the H α (Ω12 ) Sobolev inner product on Ω 12 . interior nodes in Ωi∗ with boundary nodes on B (i) . and f B ∈ IRnB as the restriction of f onto the nodes on B.. If vi denotes a finite element ∗ (i)T (i)T function defined on subdomain Ω i . (2) (1) (i) (i) Define ni = (nI + nB ) and nE = (n1 + n2 ). B (1) and ∗ ∗ Ω2 \ Ω 1 and define the associated set of indices as: ⎧ ∗ ⎪ ⎪ I11 = indices of nodes in Ω1∗ \ Ω 2 ⎪ ⎪ ⎪ ⎪ (2) (2) ⎨ B = indices of nodes in B ∗ I12 = indices of nodes in Ω12 (6. v(2) ≡ 1 2 v1 − v2 2α. Ω2∗ ) with (1) (1) (2) associated nodal vectors v . we define local interior load vectors f I ∈ IRnI as the restriction of f onto the interior nodes in each subdomain.

By construction.298 6 Least Squares-Control Theory: Iterative Algorithms The constraints (6. The objective functional J v(1) . v(2) and the linear constraints may be expressed compactly using matrix notation. ∗ of the local nodal vectors will match on Ω 12 . 2 v (2) −R21 Aα R12 R21 Aα R21 T T v (2) 2 E (6. . then the restrictions: R12 u(1) = R21 u(2) . We let K denote a singular matrix of size nE having the following block structure:   T R12 Aα R12 −R12 T Aα R21 K= (6. and hence their associated finite ∗ element functions u1 and u2 will also match on the region Ω 12 of overlap. Remark 6.8) vB = g(i) (i) where f I denotes the local internal load vector and g(i) denotes the unknown discrete Dirichlet boundary data on B (i) . Here N is an (1) (2) (i) (nI +nI )×n rectangular matrix and N (i) is an nI ×ni rectangular matrix.10) The constraints (6.11). of full rank.11) (i)T (i)T where the local nodal vectors v(i) satisfy v(i) = (vI . Since the second block row above corresponds to a renaming of the Dirichlet boundary data. v(2) )T . v(2) )T may be equivalently expressed as:  T    1 v(1) T R12 Aα R12 −R12 T Aα R21 v(1) 1 T J (vE ) = = v KvE . we eliminate g(i) (i) and shall henceforth employ vB . where N = (2) . u(2) = 0. The discrete least squares-control formulation seeks to minimize J (vE ) subject to constraint (6. as described in the following result.8) may be expressed compactly as:    (1)  N (1) 0   f N vE = f E .5) can be discretized to yield the following linear system: (i) (i) (i) (i) (i) AII vI + AIB vB = f I (i) 1≤i≤2 (6. Then functional J(vE ) T T for vE = (v(1) .9) −R21T Aα R12 R21 T Aα R21 T T corresponding to the partitioning vE = (v(1) .2. if J u(1) . f E = I(2) (i) (i) (i) 0 N fI (6. N ≡ AII AIB . vB )T .

Suppose the following assumptions hold. Let u denote the solution of (6.1 Two Overlapping Subdomains 299 Lemma 6. 6.3.2). . 1.

• Matrix K should be symmetric and coercive within the subspace V0 : V0 = {vE : N vE = 0} .12) is easily derived by re- quiring the first variation of L (vE . see [FA14]) may be employed to solve (6.14) (1) (2) where λ ∈ IRnI +nI denotes a vector of Lagrange multiplier variables.13) 3. 4. This is equivalent to the inf sup condition which can easily be verified for (6.4. with J (wE ) = 0. λ) ≡ J (vE ) + λT (N vE − f E ). (6. we shall describe an alternative approach.15) N 0 λ fE Here matrix K is a singular matrix of size nE having low rank. Proof. 2 let Ri denote a restriction matrix mapping a nodal vector of ∗ the form v onto a vector of nodal values on Ω i . as described in Chap. while matrix (1) (2) N is an (nI + nI ) × nE matrix of full rank. General results in Chap. the saddle point linear system associated with (6. 2. (6. λ) to be zero. w(2) denote an extended nodal vector satisfying: J (wE ) = min J (vE ) (6. define a Lagrangian function L (vE .12) can be reformulated as a saddle point linear system. λ) L (vE . the following results will hold for 0 ≤ α ≤ 1 w(i) = Ri u for i = 1. • Matrix N should have full rank. Traditional iterative algorithms based either on augmented Lagrangian formulations [GL7] or the projected gradient method (as in Chap. (6.15). For i = 1. Indeed. 10:      K NT vE 0 = . Remark 6. T T T 2.11) since N (i) are of full rank. .15) will be nonsingular even though matrix K is singular. Let wE = w(1) . Follows by construction. 10 show that a saddle point system is nonsingular when the following conditions hold.12) vE ∈V∗ where V∗ = {vE : N vE = f E } . We briefly outline why system (6. Then. However.   The constrained minimization problem (6. Then.

By construction N vE = 0 yields N (i) v(i) = 0 for i = 1. the subdomain Dirichlet data vB and vB represent (1)control variables. and so the restriction R12 v(1) − R21 v(2) will be ∗ discrete harmonic on Ω12 . 2. it will hold that (1) (2) R12 v = R21 v and consequently. (6. 2. The minimum of J (vE ) in V∗ can alternatively be sought by parameterizing V∗ and minimizing the resulting unconstrained functional. e1 ≡ R12 I 0  −1   −1  (6.300 6 Least Squares-Control Theory: Iterative Algorithms Suppose the coercivity of K within V0 is violated.18) (2) (2) (2) (2) −AII AIB AII fI H2 ≡ R21 . We arrive at a contradiction. denote the restrictions of such vectors to Ω 12 as: (1) R12 v(1) = H1 vB + e1 (2) (6. e2 ≡ R21 . yielding that v = 0 and v(i) = 0 for i = 1. The general solution to the full rank system N vE = f E can be (i) (i) parameterized in terms of the boundary data vB by solving N (i) v(i) = f I :     (i)−1 (i) (i)−1 (i) (i) −AII AIB (i) AII f I v = vB + for i = 1.16) I 0 ∗ To simplify the expressions. 2. there must exist a non-trivial vE ∈ V0 satisfying vTE KvE = R12 v(1) − R21 v(2) 2Aα = 0. Since R12 v(1) − R21 v(2) 2Aα = 0. I 0 (1) (2) Here. a global nodal vector v can be defined matching v(i) on both subdomains and by construction v will satisfy A v = 0. (2) Substituting this parameterization . then due to the finite dimensionality of V0 . We describe this approach next.17) R21 v(2) = H2 vB + e2 where     (1)−1 (1) (1)−1 (1) −AII AIB AII fI H1 ≡ R12 .

v yields the (1) (2) following reduced functional JB vB . into the functional J v . vB = J(v(1) . v(2) ): .

/.

.

v B ≡ 1 2 / H1 vB + e1 − H2 vB + e2 / . /2 (1) (2) / (1) (2) / JB v B .13) (1) (2) seeks boundary data wB and wB which minimizes: . Aα The new unconstrained minimization problem associated with (6.12) and (6.

.

vB . (6.vB ) Applying stationarity conditions to: .19) (1) (2) (vB . wB = min JB vB . (1) (2) (1) (2) JB wB .

wB = H1 wB − H2 wB + e1 − e2 2Aα 2 . 1 (1) (2) (1) (2) JB wB .

1 Two Overlapping Subdomains 301 yields the linear system: ⎧ . 6.

⎪ ⎨ 21 ∂J(1) B (1) (2) = H1T Aα H1 wB − H2 wB + e1 − e2 = 0 ∂wB .

20).12) and (6. We thus have the following equivalence between (6. ·): J u(1) .19) is obtained by solving sys- tem (6. ⎪ ⎩ 2 (2) = −H2T Aα H1 w(1) (2) B − H2 wB + e1 − e2 = 0. (v(1) . u(2) = min J v(1) . Suppose the following assumptions hold.v(2) )∈V∗ . T 1. v(2) . we assume that the solution to (6. Lemma 6. . T - H1T Aα H1 −H1T Aα H2 wB H1 Aα (e2 − e1 ) = . Let u(1) .20) −H2T Aα H1 H2T Aα H2 wB (2) H2T Aα (e1 − e2 ) Henceforth.5.19). 1 ∂JB ∂wB Rewriting the above yields a block linear system: . (6. u(2) denote the constrained minimum of J(·. (1)  .

T (1) (2) 2. Let wB . wB denote the unconstrained minimum of JB (·. ·): .

.

vB . (1) (2) (1) (2) JB wB . (1) (2) (vB . wB = min JB vB . the following results will hold:  (i)−1 .vB ) Then.

20) generates the quadratic form associated with a square norm. and an iterative method such as CG algorithm may be applied to solve (6. The coefficient matrix in (6.20). 2. Follows by direct substitution and algebraic simplification.   Remark 6. it will also be positive definite. 2.20) is positive definite. Since the coefficient matrix in (6.6. wB Proof. without loss of generality let ei = 0 for i = 1. Importantly. it will be positive semidefinite: . To verify that the coefficient matrix in (6. (i) (i) (i)  A f − A w u(i) = II I (i) IB B i = 1.20) is symmetric by construction.

Aα To show definiteness