Peter Kosmol, Dieter Muller-Wichards-Optimization in Function Spaces - With Stability Considerations in Orlicz Spaces (De Gruyter Series in Nonline

De Gruyter Series in Nonlinear Analysis
and Applications 13
Editor in Chief
Jrgen Appell, Wrzburg, Germany
Editors
Catherine Bandle, Basel, Switzerland
Alain Bensoussan, Richardson, Texas, USA
Avner Friedman, Columbus, Ohio, USA
Karl-Heinz Hoffmann, Munich, Germany
Mikio Kato, Kitakyushu, Japan
Umberto Mosco, Worchester, Massachusetts, USA
Louis Nirenberg, New York, USA
Katrin Wendland, Augsburg, Germany
Alfonso Vignoli, Rome, Italy
Peter Kosmol
Dieter Mller-Wichards
Optimization in
Function Spaces
With Stability Considerations in Orlicz Spaces
De Gruyter
Mathematics Subject Classification 2010: Primary: 46B20, 46E30, 52A41; Secondary: 41-02,
41A17, 41A25, 46B10, 46T99, 47H05, 49-02, 49K05, 49K15, 49K40, 49M15, 90C34.
ISBN 978-3-11-025020-6
e-ISBN 978-3-11-025021-3
ISSN 0941-813X
Library of Congress Cataloging-in-Publication Data
Kosmol, Peter.
Optimization in function spaces with stability considerations on
Orlicz spaces / by Peter Kosmol, Dieter Mller-Wichards.
p. cm. (De Gruyter series in nonliniear analysis and appli-
cations ; 13)
Includes bibliographical references and index.
ISBN 978-3-11-025020-6 (alk. paper)
1. Stability Mathematical models. 2. Mathematical opti-
mization. 3. Orlicz spaces. I. Mller-Wichards, D. (Dieter),
1949 II. Title.
QA871.K8235 2011
5151.392dc22
2010036123
Bibliographic information published by the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available in the Internet at http://dnb.d-nb.de.
2011 Walter de Gruyter GmbH & Co. KG, Berlin/New York
Typesetting: Da-TeX Gerd Blumenstein, Leipzig, www.da-tex.de
Printing and binding: Hubert & Co. GmbH & Co. KG, Gttingen
Printed on acid-free paper
Printed in Germany
www.degruyter.com
Preface
In this text we present a treatise on optimization in function spaces. A concise but
thorough account of convex analysis serves as a basis for the treatment of Orlicz
spaces and Variational Calculus as a contribution to non-linear and linear functional
analysis.
As examples may serve our discussion of stability questions for families of opti-
mization problems or the equivalence of strong solvability, local uniform convexity,
and Frchet differentiability of the convex conjugate, which do provide new insights
and open up new ways of dealing with applications.
A further contribution is a novel approach to the fundamental theorems of Varia-
tional Calculus, where pointwise (nite-dimensional) minimization of suitably con-
vexied Lagrangians w.r.t. the state variables x and x is performed simultaneously.
Convexication in this context is achieved via quadratic supplements of the La-
grangian. These supplements are constant on the restriction set and thus lead to equiv-
alent problems.
It is our aim to present an essentially self-contained book on the theory of convex
functions and convex optimization in Banach spaces. We assume that the reader is fa-
miliar with the concepts of mathematical analysis and linear algebra. Some awareness
of the principles of measure theory will turn out to be helpful.
The methods mentioned above are applied to Orlicz spaces in a twofold sense: not
only do we consider optimization and in particular norm approximation problems in
Orlicz spaces but we also use these methods to develop a complete theory of Orlicz
spaces for -nite measures. We also employ the stability principles developed in this
text in order to establish optimality for solutions of the EulerLagrange equations of
certain non-convex variational problems.
Overview
In Chapter 1 we present the classical approximation theory for L
1
and C(T) em-
bedded into the category of Orlicz spaces. In particular we present the theorems of
Jackson (zeros of error function) and Bernstein (error estimates) and their extension
to Orlicz spaces. For differentiable (1-D) Young functions the non-linear systems of
equations that arise from linear modular and norm approximation problems are dis-
cussed. For their solution the rapidly convergent numerical methods of Chapter 4 can
be used.
Chapter 2 is devoted to Polya-type algorithms in Orlicz spaces to determine a best
Chebyshev (or L
1
) approximation. The idea is to replace a numerically ill conditioned
problem, affected by non-uniqueness and non-differentiability, with a sequence of
vi Preface
well-behaved problems. The closedness of the algorithm is already guaranteed by the
pointwise convergence of the sequence of Young functions. This is due to the stability
principles presented in Chapter 5. Convergence estimates for the corresponding se-
quence of Luxemburg norms can be gleaned from the Young functions in the discrete
and continuous case.
These estimates in turn can be utilized to regularize the Polya algorithm in order to
achieve actual convergence to a two-stage solution, a subject that is discussed within
a general framework in more detail in Chapter 5. For sequences of Young functions
with separation property the convergence of discrete approximations to the strict ap-
proximation of Rice is shown.
The last application in this chapter is of a somewhat different nature: it is well
known that many applications can be restated as linear programming problems. Semi-
innite optimization problems are a generalization with potentially innitely many
restrictions (for which a fairly large literature exists). We solve the problemby making
use of the stability results developed in this book: we approximate the restriction set
by a sequence of smooth restriction sets making use of the Lagrange mechanism and
the existence of Lagrange multipliers, which in turn is guaranteed by corresponding
regularity conditions.
In Chapter 3 we develop the theory of convex functions and convex sets in normed
spaces. We stress the central role of the (one-sided) directional derivative and by
using its properties derive necessary and sufcient conditions for minimal solutions
of convex optimization problems.
For later use we consider in particular convex functions on nite dimensional
spaces. It turns out that real-valued convex functions are already continuous, and
differentiable convex functions are already continuously differentiable. The latter fact
is used later to show that the Gteaux derivative of a continuous convex function is
already demi-continuous (a result that is used in Chapter 8 to prove that a reexive
and differentiable Orlicz space is already Frchet differentiable).
We discuss the relationship between the Gteaux and Frchet derivatives of a con-
vex function in normed spaces and give the proof of a not very well-known theorem of
Phelps on the equivalence of the continuity of the Gteaux derivative and the Frchet
differentiability of a convex function.
Based on the HahnBanach extension theorem we prove several variants of separa-
tion theorems (Mazur, Eidelheit, strict). Using these separation theorems we show the
existence of the subdifferential of a continuous convex function, and derive a number
of properties of a convex function together with its convex conjugate (in particular
Youngs equality and the theorem of FenchelMoreau).
The separation theorems are then used to prove the Fechel-duality theorem and
in an unconventional way by use of the latter theorem the existence of minimal
solutions of a convex function on a bounded convex subset of a reexive Banach
space (theorem of MazurSchauder).
Preface vii
The last section of this chapter is devoted to Lagrange multipliers where we show
their existence again using the separation theorem and the Lagrange duality theo-
rem which is used in Chapter 7 to prove the (general) Amemiya formula for the Orlicz
norm.
In Chapter 5 we turn our attention to stability principles for sequences (and fam-
ilies) of functions. We show the closedness of algorithms that are lower semi-con-
tinuously convergent. Using this result we derive stability theorems for monotone
convergence. Among the numerous applications of this result is the computation of
the right-handed derivative of the maximum norm on C(T) which in turn is used to
provide a simple proof for the famous Kolmogoroff criterion for a best Chebyshev
approximation (see Chapter 1).
Our main objective in this chapter is to consider pointwise convergent sequences of
convex functions. Based on an extension of the uniform boundedness principle of Ba-
nach, formulated originally for linear functionals, to families of convex functions, it
turns out that pointwise convergent sequences of convex functions are already contin-
uously convergent. We are thus enabled to consider sequences of convex optimization
problems (f
n
, M
n
) where the functions f
n
converge pointwise to f and the sets M
n
in
the sense of Kuratowski to M, while closedness of the algorithm is preserved. In the
nite dimensional case we can even guarantee the existence of points of accumulation,
provided that the set of minimal solutions of (f, M) is non-empty and bounded.
In the next section of this chapter we consider two-stage solutions where un-
der certain conditions we can guarantee the actual convergence of the sequence
of minimal solutions and characterize it. Among the methods being considered is
differentiation w.r.t. the family parameter: for example, it turns out that the best L
p
-
approximations converge for p 1 to the best L
1
-approximation of maximal entropy.
Outer regularizations in the spirit of Tikhonov are also possible, provided that con-
vergence estimates are available.
In the nal section of this chapter we show that a number of stability results for
sequences of convex functions carry over to sequences of monotone operators. If P
is a non-expansive Fjer contraction then I P is monotone. Such operators occur in
the context of smoothing of linear programming problems where the solution of the
latter (non-smooth) problem appears as a second stage solution of the solutions of the
smooth operator sequence.
In Chapter 6 we introduce the Orlicz space L
for general measures and arbitrary

(not necessarily nite) Young functions . We discuss the properties and relations
of Young functions and their conjugates and investigate the structure of Orlicz spaces
equipped with the Luxemburg norm. An important subspace is provided by the clo-
sure of the space of step functions M
. It turns out that M
= L
if and only if
satises an appropriate
2
-condition. Moreover, if does not satisfy a
2
-condition
then L
contains a closed subspace isomorphic to
, in the case of a nite not purely

atomic measure this isomorphy is even isometric. These statements are due to the the-
viii Preface
orem of LindenstraussTsafriri for Orlicz sequence spaces and a theorem of Turett for
non-atomic measures. These results become important in our discussion of reexivity
in Chapters 7 and 8.
In Chapter 7 we introduce the Orlicz norm which turns out to be equivalent to the
Luxemburg norm. Using Jensens integral inequality we show that norm convergence
already implies convergence in measure. We show that the convex conjugate of the
modular f
is f
, provided that is the conjugate of the Young function . An

important consequence is that the modular is always lower semi-continuous. These
facts turn out to be another ingredient in proving the general Amemiya formula for
the Orlicz norm.
The main concern of this chapter is to characterize the dual space of an Orlicz space:
it turns out that for nite and -nite measures we obtain (M
= L
, provided
that is the conjugate of . Reexivity of an Orlicz space is then characterized by
an appropriate (depending on the measure)
2
-condition for and .
Based on the theorem of Lusin and a theorem of Krasnosielski, establishing that
the continuous functions are dense in M
, we show that, if and are nite, and

T is a compact subset of R
m
, and the Lebesgue measure, then M
is separable,
where separability becomes important in the context of greedy algorithms and the
Ritz method (Chapter 8).
We conclude the chapter by stating and proving the general Amemiya formula for
the Orlicz norm.
Based on the results of Chapter 6 and 7 we now (in Chapter 8) turn our attention
to the geometry of Orlicz spaces. Based on more general considerations in normed
spaces we obtain for not purely atomic measures that M
is smooth if and only if

is differentiable. For purely atomic measures the characterization is somewhat more
complicated.
Our main objective is to characterize strong solvability of optimization problems
where convergence of the values to the optimum already implies norm convergence
of the approximations to the minimal solution. It turns out that strong solvability can
be geometrically characterized by the local uniform convexity of the corresponding
convex functional (provided that the term local uniform convexity is appropriately
dened, which we do). Moreover, we establish that in reexive Banach spaces strong
solvability is characterized by the Frchet differentiability of the convex conjugate,
provided that both are bounded functionals. These results are based in part on a paper
of Asplund and Rockafellar on the duality of A-differentiability and B-convexity of
conjugate pairs of convex functions, where B is the polar of A.
Before we apply these results to Orlicz spaces, we turn our attention to E-spaces in-
troduced by Fan and Glicksberg, where every weakly closed subset is approximatively
compact. A Banach space is an E-space if and only if it is reexive, strictly convex,
and satises the KadecKlee property. In order to establish reexivity the theorem
of James is applied. Another characterization is given by Anderson: X is an E-space
Preface ix
if and only if X
is Frchet differentiable. The link between Frchet differentiability

and KadecKlee property is then provided through the lemma of Shmulian [42].
With these tools at hand we can show that for nite not purely atomic measures
Frchet differentiability of an Orlicz space already implies its reexivity. The main
theorem gives in 17 equivalent statements a characterization of strong solvability, lo-
cal uniform convexity, and Frchet differentiability of the dual, provided that L
is
reexive. It is remarkable that all these properties can also be equivalently expressed
by the differentiability of or the strict convexity of . In particular it turns out
that in Orlicz spaces the necessary conditions of strict convexity and reexivity for an
E-space are already sufcient.
We conclude the geometrical part of this chapter by a discussion on the duality of
uniform convexity and uniform differentiability of Orlicz spaces based on a corre-
sponding theorem by Lindenstrauss. We restate a characterization by Milne of uni-
formly convex Orlicz spaces equipped with the Orlicz norm and present an example
of a reexive Orlicz space, also due to Milne, that is not uniformly convex but (ac-
cording to our results) the square of its norm is locally uniformly convex. Following
A. Kaminska we show that uniform convexity of the Luxemburg norm is equivalent
to -convexity of the dening Young function.
In the last section we discuss a number of underlying principles for certain classes
of applications:
Tikhonov regularization: this method was introduced for the treatment of ill-
posed problems (of which there is a whole lot). The convergence of the method
was proved by Levitin and Polyak for uniformly convex regularizing function-
als. We show here that locally uniformly convex regularizations are sufcient
for that purpose. As we have given a complete description of local uniform
convexity in Orlicz spaces we can state such regularizing functionals explicitly.
Ritz method: the Ritz method plays an important role in many applications (e.g.
FEM-methods). It is well known that the Ritz procedure generates a minimiz-
ing sequence. Actual convergence of the solutions on each subspace is only
achieved if the original problem is strongly solvable.
Greedy algorithms have indeed drawn a growing attention and experienced a

rapid development in recent years (see e.g. Temlyakov). The aim is to arrive
at a compressed representation of a function in terms of its dominating fre-
quencies. The convergence proof makes use of the KadecKlee property of an
E-space.
Viewing these 3 applications at one glance it turns out that there is an inherent
relationship: local uniform convexity, strong solvability, and KadecKlee property are
3 facets of the same property which we have completely described in Orlicz spaces.
In the last chapter we describe an approach to variational problems, where the solu-
tions appear as pointwise (nite dimensional) minima for xed t of the supplemented
x Preface
Lagrangian. The minimization is performed simultaneously w.r.t. to both the state
variable x and x, different from Pontryagins maximum principle, where optimiza-
tion is done only w.r.t. the x variable. We use the idea of the Equivalent Problems
of Carathodory employing suitable (and simple) supplements to the original mini-
mization problem. Whereas Carathodory considers equivalent problems by use of
solutions of the HamiltonJacobi partial differential equations, we shall demonstrate
that quadratic supplements can be constructed, such that the supplemented Lagrangian
is convex in the vicinity of the solution. In this way, the fundamental theorems of the
Calculus of Variations are obtained. In particular, we avoid any usage of eld theory.
We apply the stability principles of Chapter 5 to problems of variational calculus
and control theory. As an example, we treat the isoperimetric problem (by us referred
to as the Dido problem) as a problem of variational calculus. It turns out that the iden-
tication of the circle as a minimal solution can only be accomplished by employing
stability principles to appropriately chosen sequences of variational problems.
The principle of pointwise minimization is then applied to the detection of a smooth,
monotone trend in time series data in a parameterfree manner. In this context we also
employ a Tikhonov-type regularization.
The last part of this chapter is devoted to certain problems in optimal control, where,
to begin with, we put our focus on stability questions. In the nal section we treat
a minimal time problem which turns out to be equivalent to a linear approximation in
the mean, and thus closing the circle of our journey through function spaces.
We take this opportunity to express our thanks to the staff of DeGruyter, in particu-
lar to Friederike Dittberner, for their friendly and professional cooperation throughout
the publication process.
Kiel, November 2010 Peter Kosmol
Dieter Mller-Wichards
Contents
Preface v
1 Approximation in Orlicz Spaces 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 A Brief Framework for Approximation in Orlicz Spaces . . . . . . . . 7
1.3 Approximation in C(T) . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Chebyshev Approximations in C(T) . . . . . . . . . . . . . . 10
1.3.2 Approximation in Haar Subspaces . . . . . . . . . . . . . . . 12
1.4 Discrete L
-approximations . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Discrete Chebyshev Approximation . . . . . . . . . . . . . . 20
1.4.2 Linear L
-approximation for Finite Young Functions . . . . . 24

1.5 Determination of the Linear L
-approximation . . . . . . . . . . . . 28
1.5.1 System of Equations for the Coefcients . . . . . . . . . . . . 28
1.5.2 The Method of Karlovitz . . . . . . . . . . . . . . . . . . . . 29
2 Polya Algorithms in Orlicz Spaces 34
2.1 The Classical Polya Algorithm . . . . . . . . . . . . . . . . . . . . . 34
2.2 Generalized Polya Algorithm . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Polya Algorithm for the Discrete Chebyshev Approximation . . . . . 35
2.3.1 The Strict Approximation as the Limit of Polya Algorithms . 37
2.3.2 About the Choice of the Young Functions . . . . . . . . . . . 39
2.3.3 Numerical Execution of the Polya Algorithm . . . . . . . . . 40
2.4 Stability of Polya Algorithms in Orlicz Spaces . . . . . . . . . . . . . 41
2.5 Convergence Estimates and Robustness . . . . . . . . . . . . . . . . 45
2.5.1 Two-Stage Optimization . . . . . . . . . . . . . . . . . . . . 47
2.5.2 Convergence Estimates . . . . . . . . . . . . . . . . . . . . . 49
2.6 A PolyaRemez Algorithm in C(T) . . . . . . . . . . . . . . . . . . 58
2.7 Semi-innite Optimization Problems . . . . . . . . . . . . . . . . . . 64
2.7.1 Successive Approximation of the Restriction Set . . . . . . . 64
3 Convex Sets and Convex Functions 72
3.1 Geometry of Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3 Difference Quotient and Directional Derivative . . . . . . . . . . . . 80
3.3.1 Geometry of the Right-sided Directional Derivative . . . . . . 81
3.4 Necessary and Sufcient Optimality Conditions . . . . . . . . . . . . 85
xii Contents
3.4.1 Necessary Optimality Conditions . . . . . . . . . . . . . . . 85
3.4.2 Sufcient Condition: Characterization Theorem of Convex
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5 Continuity of Convex Functions . . . . . . . . . . . . . . . . . . . . 86
3.6 Frchet Differentiability . . . . . . . . . . . . . . . . . . . . . . . . 87
3.7 Convex Functions in R
n
. . . . . . . . . . . . . . . . . . . . . . . . 88
3.8 Continuity of the Derivative . . . . . . . . . . . . . . . . . . . . . . 91
3.9 Separation Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.10 Subgradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.11 Conjugate Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.12 Theorem of Fenchel . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.13 Existence of Minimal Solutions for Convex Optimization . . . . . . . 109
3.14 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4 Numerical Treatment of Non-linear Equations and Optimization
Problems 115
4.1 Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.2 Secant Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.3 Global Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.3.1 Damped Newton Method . . . . . . . . . . . . . . . . . . . 122
4.3.2 Globalization of Secant Methods for Equations . . . . . . . . 124
4.3.3 Secant Method for Minimization . . . . . . . . . . . . . . . . 126
4.4 A Matrix-free Newton Method . . . . . . . . . . . . . . . . . . . . . 128
5 Stability and Two-stage Optimization Problems 129
5.1 Lower Semi-continuous Convergence and Stability . . . . . . . . . . 129
5.1.1 Lower Semi-equicontinuity and Lower Semi-continuous
Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.1.2 Lower Semi-continuous Convergence and Convergence
of Epigraphs . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2 Stability for Monotone Convergence . . . . . . . . . . . . . . . . . . 132
5.3 Continuous Convergence and Stability for Convex Functions . . . . . 134
5.3.1 Stability Theorems . . . . . . . . . . . . . . . . . . . . . . . 141
5.4 Convex Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.5 Quantitative Stability Considerations in R
n
. . . . . . . . . . . . . . 153
5.6 Two-stage Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.6.1 Second Stages and Stability for Epsilon-solutions . . . . . . . 164
5.7 Stability for Families of Non-linear Equations . . . . . . . . . . . . . 166
5.7.1 Stability for Monotone Operators . . . . . . . . . . . . . . . 167
5.7.2 Stability for Wider Classes of Operators . . . . . . . . . . . . 171
5.7.3 Two-stage Solutions . . . . . . . . . . . . . . . . . . . . . . 172
Contents xiii
6 Orlicz Spaces 175
6.1 Young Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.2 Modular and Luxemburg Norm . . . . . . . . . . . . . . . . . . . . . 185
6.2.1 Examples of Orlicz Spaces . . . . . . . . . . . . . . . . . . . 188
6.2.2 Structure of Orlicz Spaces . . . . . . . . . . . . . . . . . . . 193
6.2.3 The
2
-condition . . . . . . . . . . . . . . . . . . . . . . . . 197
6.3 Properties of the Modular . . . . . . . . . . . . . . . . . . . . . . . . 208
6.3.1 Convergence in Modular . . . . . . . . . . . . . . . . . . . . 208
6.3.2 Level Sets and Balls . . . . . . . . . . . . . . . . . . . . . . 211
6.3.3 Boundedness of the Modular . . . . . . . . . . . . . . . . . . 212
7 Orlicz Norm and Duality 214
7.1 The Orlicz Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
7.2 Hlders Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
7.3 Lower Semi-continuity and Duality of the Modular . . . . . . . . . . 216
7.4 Jensens Integral Inequality and the Convergence in Measure . . . . . 220
7.5 Equivalence of the Norms . . . . . . . . . . . . . . . . . . . . . . . . 224
7.6 Duality Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.7 Reexivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.8 Separability and Bases of Orlicz Spaces . . . . . . . . . . . . . . . . 234
7.8.1 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
7.8.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.9 Amemiya formula and Orlicz Norm . . . . . . . . . . . . . . . . . . 236
8 Differentiability and Convexity in Orlicz Spaces 241
8.1 Flat Convexity and Weak Differentiability . . . . . . . . . . . . . . . 241
8.2 Flat Convexity and Gteaux Differentiability of Orlicz Spaces . . . . 244
8.3 A-differentiability and B-convexity . . . . . . . . . . . . . . . . . . 247
8.4 Local Uniform Convexity, Strong Solvability and Frchet
Differentiability of the Conjugate . . . . . . . . . . . . . . . . . . . . 253
8.4.1 E-spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
8.5 Frchet differentiability and Local Uniform Convexity in Orlicz
Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
8.5.1 Frchet Differentiability of Modular and Luxemburg Norm . . 267
8.5.2 Frchet Differentiability and Local Uniform Convexity . . . . 276
8.5.3 Frchet Differentiability of the Orlicz Norm and Local
Uniform Convexity of the Luxemburg Norm . . . . . . . . . 278
8.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
8.6 Uniform Convexity and Uniform Differentiability . . . . . . . . . . . 281
8.6.1 Uniform Convexity of the Orlicz Norm . . . . . . . . . . . . 283
8.6.2 Uniform Convexity of the Luxemburg Norm . . . . . . . . . 288
8.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
xiv Contents
8.7.1 Regularization of Tikhonov Type . . . . . . . . . . . . . . . 294
8.7.2 Ritzs Method . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.7.3 A Greedy Algorithm in Orlicz Space . . . . . . . . . . . . . . 299
9 Variational Calculus 309
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
9.1.1 Equivalent Variational Problems . . . . . . . . . . . . . . . . 311
9.1.2 Principle of Pointwise Minimization . . . . . . . . . . . . . . 312
9.1.3 Linear Supplement . . . . . . . . . . . . . . . . . . . . . . . 313
9.2 Smoothness of Solutions . . . . . . . . . . . . . . . . . . . . . . . . 316
9.3 Weak Local Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
9.3.1 Carathodory Minimale . . . . . . . . . . . . . . . . . . . . 325
9.4 Strong Convexity and Strong Local Minima . . . . . . . . . . . . . . 325
9.4.1 Strong Local Minima . . . . . . . . . . . . . . . . . . . . . . 330
9.5 Necessary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 333
9.5.1 The Jacobi Equation as a Necessary Condition . . . . . . . . 333
9.6 C
1
-variational Problems . . . . . . . . . . . . . . . . . . . . . . . . 335
9.7 Optimal Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
9.8 Stability Considerations for Variational Problems . . . . . . . . . . . 337
9.8.1 Parametric Treatment of the Dido problem . . . . . . . . . . 339
9.8.2 Dido problem . . . . . . . . . . . . . . . . . . . . . . . . . . 341
9.8.3 Global Optimal Paths . . . . . . . . . . . . . . . . . . . . . . 345
9.8.4 General Stability Theorems . . . . . . . . . . . . . . . . . . 346
9.8.5 Dido problem with Two-dimensional Quadratic Supplement . 349
9.8.6 Stability in OrliczSobolev Spaces . . . . . . . . . . . . . . . 352
9.9 Parameter-free Approximation of Time Series Data by Monotone
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
9.9.1 Projection onto the Positive Cone in Sobolev Space . . . . . . 354
9.9.2 Regularization of Tikhonov-type . . . . . . . . . . . . . . . . 357
9.9.3 A Robust Variant . . . . . . . . . . . . . . . . . . . . . . . . 362
9.10 Optimal Control Problems . . . . . . . . . . . . . . . . . . . . . . . 363
9.10.1 Minimal Time Problem as a Linear L
1
-approximation
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Bibliography 371
List of Symbols 379
Index 381
Chapter 1
Approximation in Orlicz Spaces
1.1 Introduction
In the present textbook we want to treat the methods of convex optimization under
both practical and theoretical aspects. As a standard example we will consider ap-
proximation problems in Orlicz spaces, whose structure and geometry is to a large
extent determined by one-dimensional convex functions (so-called Young functions).
Convexity of a one-dimensional function f can be visualized in the following way:
if one connects two points of the graph by a chord then the chord will stay above the
graph of the function f.
(
(
(
(
(
(
(
(
(
This behavior can be described by Jensens inequality
f(s + (1 )t) f(s) + (1 )f(t)
for all [0, 1] and all s, t in the interval, on which f is dened.
The function f is called strictly convex, if Jensens inequality for (0, 1) always
holds in the strict sense. If f is differentiable, then an equivalent description of con-
vexity, which is also easily visualized, is available: the tangent at an arbitrary point of
the graph always stays below the graph of f:
2 Chapter 1 Approximation in Orlicz Spaces

This behavior can be expressed by the following inequality
f(t
0
) +f
t
(t
0
)(t t
0
) f(t) (1.1)
for all t, t
0
in the interval, on which f is dened. Later on we will encounter this
inequality in a more general form and environment as subgradient inequality.
A Young function is now dened as a non-negative, symmetric, and convex func-
tion on R with (0) = 0.
The problem of the approximation in the mean is well suited to serve as an illustra-
tion of the type of questions and methods, which are among the primary objectives of
this text.
The approximation in the mean appears as a natural problem in a number of appli-
cations. Among them are robust statistics (simplest example: median), where outliers
in the data have much smaller inuence on the estimate than e.g. in the case of the
approximation in the square mean (simplest example: arithmetic mean). A further
application can be found in the eld of time-optimal controls (see Theorem 9.10.5).
In its simplest setting the problem can be described in the following way: for a
given point x of the R
m
we look for a point y in a xed subset M of R
m
, which
among all points of M has least distance to x, where the distance is understood in the
following sense:
|x y|
1
=
m
i=1
[x
i
y
i
[. (1.2)
In a situation like this we will normally require that the set M is convex, i.e. the
chord connecting two arbitrary points of M has to be contained in M. We want to
visualize this problem geometrically. For this purpose we rst consider the unit ball
of the | |
1
-norm in R
2
, and compare this to the unit balls of the Euclidean and the
maximum norm.
unit balls
@
@
@
@
@
@
We will treat the connection of these norms to the corresponding Young functions
in subsequent sections of the current and subsequent chapters, primarily under the
aspect of computing linear approximations and in Chapters 6, 7 and 8 under structural
points of view.
Section 1.1 Introduction 3
The shape of the unit balls inuences the solvability of the approximation prob-
lem: the fact that a ball has vertices leads to non-differentiable problems. Intervals
contained in the sphere can result in non-uniqueness of the best approximations.
The best approximation of a point x w.r.t. a convex set M can be illustrated in the
following way: the ball with center x is expanded until it touches the set M. The
common points of the corresponding sphere and the set M (the tangent points) are
then the best approximations to be determined. It is apparent that different types of
distances may lead to different best approximations.
Example 1.1.1. Let x = (2, 2) and M = x R
2
[ x
1
+ x
2
= 1 be the straight
line through the points P = (1, 0) and Q = (0, 1). The set of best approximations in
the mean (i.e. w.r.t. the | |
1
-norm) is in this case the whole interval [P, Q], whereas
the point (0.5, 0.5) is the only best approximation w.r.t. the Euclidean norm and the
maximum norm.
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
The computation of a best approximation in the mean using differential calculus
creates a certain difculty since the modulus function is not differentiable at zero.
Instead we will approximate the modulus function by differentiable Young functions.
By a suitable selection of these one-dimensional functions the geometrical properties
of the corresponding unit balls can be inuenced in a favorable way. This does not
only refer to differentiability and uniqueness, but also to the numerical behavior.
In order to solve the original problem of computing the best approximation in the
mean, a sequence of approximating problems is considered, whose solutions converge
to a solution of the original problem. Each of these approximate problems is deter-
mined by a (one-dimensional) Young function. The richness of the available choices
provides the opportunity to achieve a stable behavior of the sequence of minimal so-
lutions of the approximating problems in the above sense.
In the sequel we will sketch the framework for this type of objective. The problem
of the best approximation in the mean described above will, in spite of its simple
structure, give us the opportunity to illustrate the principal phenomena.
For this purpose we adopt a different notation.
Let
0
: R R be dened as
0
(s) := [s[, then we obtain for the problem of
approximation in the mean:
For a given x R
m
determine a point v M, such that for all z M
m
i=1
0
(x
i
v
i
)
m
i=1
0
(x
i
z
i
). (1.3)
The function
0
is neither strictly convex nor differentiable at zero, resulting in a
negative inuence on the approximation properties. If we replace the one-dimensional
function
0
by approximating functions
k
, where k plays the role of a parameter,
then we obtain for each k the following approximate problem:
For given x R
m
determine a point v
(k)
M such that for all z M
m
i=1
k
(x
i
v
(k)
i
)
m
i=1
k
(x
i
z
i
) (1.4)
holds. We consider a few examples of such approximating functions:
(a)
k
(s) = [s[
1+k
(b)
k
(s) =
s
2
+k
2
k
(c)
k
(s) = [s[ k log(1 +
1
k
[s[)
(d)
k
(s) = [s[ +ks
2
for k (0, ).
The rst 3 examples represent strictly convex and everywhere differentiable func-
tions, the 4-th is strictly convex, but not differentiable at zero. The type of approxi-
mation of
0
can be described in all cases by
lim
k0
k
(s) =
0
(s)
for all s R. The function in the 3rd example seems too complicated at rst glance
but appears to be particularly well suited for numerical purposes. This is due to the
simple form of its derivative
t
k
(s) =
s
k +[s[
.
All examples have in common that for each k (0, ) the solution v
(k)
is unique.
The mapping k v
(k)
determines a curve in R
m
: the path to the solution of the
original problem. If one considers this curve as a river and the set of solutions of the
original problem as a lake, then a few questions naturally arise:
(a) Does the river ow into the lake? (Question of stability)
(b) How heavily does the river meander? (Question about numerical behavior)
(c) Does the mouth of the river form a delta? (Question about convergence to a
particular solution)
We will now raise a number of questions, which we will later on treat in a more
general framework. Does the pointwise convergence of the
k
to
0
carry over to the
convergence of the solutions v
(k)
? In order to establish a corresponding connection
we introduce the following functions for 0 k < :
f
k
: R
m
R
x
m
i=1
k
(x
i
).
Apparently, the pointwise convergence of the one-dimensional functions
k
carries
over to the pointwise convergence of the functions f
k
on R
m
. The stability theorems
in Chapter 5 will then yield convergence of the sequence (v
(k)
) in the following sense:
each sequence (v
(k
n
)
)
nN
has a convergent subsequence, whose limit is a solution of
the original problem (1.2). This behavior we denote as algorithmic closedness. If the
original problem has a unique solution v, then actual convergence is achieved
lim
k0
v
(k)
= v.
An implementation of the above process can be sketched in the following way: the
original problem of the best approximation in the mean is replaced by a parameter-
dependent family of approximate problems, which are numerically easier to handle.
These approximate problems, described by the functions
k
, will now be chosen in
such way that rapidly convergent iteration-methods for the determination of v
(k)
for
xed k can be employed, requiring normally differentiability of the function f
k
. If
a minimal solution for xed k is determined, it is used as a starting point of the iter-
ation for the subsequent approximate problem, because it turns out that better initial
guesses are required the more the approximating functions approach their limit. The
determination of the subsequent parameter k
t
depends on the structure of the problem
to be solved.
A process of this type we will denote as a Polya algorithm. The original application
of Polya was the computation of a best Chebyshev approximation by a process similar
to the one sketched above using the functions
p
: s [s[
p
(L
p
-approximation) for
p (compare Example (a) with p = 1 + k). This application we will treat in
detail in Chapter 2 in a somewhat more general framework. In this context we can
raise the following question: under what conditions does the sequence (
k
)
k
enforce
convergence of the whole sequence (v
(k)
), even if the best L
1
-approximations are
not uniquely determined. The corresponding question for the continuous Chebyshev-
approximation is denoted as the Polya problem. Beyond that the question arises, how
this limit can be described. This question will be treated in Chapter 2 and in a more
general setting in Chapter 5 in the section on two-stage optimization.
Through an appropriate selection of Young functions the Orlicz spaces, which we
will introduce in the next section, will offer a common platform for the treatment
of modular and norm approximation problems. In this context we will adopt the
modern perception of functions as points of a vector space and obtain in this manner
a framework for geometrical methods and visualizations. A formulation in terms
of general measures will enable us to treat discrete and continuous approximation
problems in a unied manner. In particular we obtain a common characterization of
optimal solutions. For approximations of continuous functions on compact subsets of
R
r
, in the simplest case on intervals, we merely need the Riemann integral, whereas
discrete problems can be described in terms of point measures.
In this chapter we will also treat the classical theorems of Chebyshev approximation
by Kolmogoroff, Chebyshev and de la Valle-Poussin. In the framework of Haar
subspaces we discuss the extensions of the also classical theorems of Bernstein and
Jackson to linear approximations in Orlicz spaces. For differentiable Young functions
we investigate the non-linear systems of equations resulting from the characterization
theorem for linear approximations and supply a solution method based on iterative
computation of weighted L
2
-approximations. In a general setting we will discuss
rapidly convergent methods for non-linear systems of equations in Chapter 4.
The geometry of convex sets and the properties of convex functions, in particular
with respect to optimization problems, are considered in Chapter 3.
Based on the considerations about the structure of Orlicz spaces and their dual
spaces in Chapters 6 and 7, where again the properties of the corresponding Young
functions play a decisive role, we turn our attention to the geometry of Orlicz spaces
in Chapter 8. Our main interest in this context is the description of strong solvability
of optimization problems, where from convergence of the values to the optimum the
convergence of the minimizing sequence to the minimal solution already follows.
This behavior can be geometrically characterized by local uniform convexity of the
corresponding functionals, dened in Chapter 8.
Frchet differentiability turns out to be a property, which is dual to local uniform
convexity.
Local uniform convexity and Frchet differentiability also play a central role in
the applications discussed in the last sections of this chapter. This holds for regular-
izations of Tychonov type, where ill posed problems are replaced by a sequence of
regular optimization problems, for the Ritz method, where sequences of minimal
solutions of increasing sequences of nite-dimensional subspaces are considered, but
also for Greedy algorithms, i.e. non-linear approximations, which aimat a compressed
representation of a given function in terms of dominating basis elements.
Section 1.2 A Brief Framework for Approximation in Orlicz Spaces 7
1.2 A Brief Framework for Approximation in Orlicz
Spaces
In this section we briey state a framework for our discussion of linear approximation
in Orlicz space. A detailed description of the structure of Orlicz spaces, its dual
spaces, its differentiability and convexity properties will be given in Chapters 6, 7
and 8.
Let (T, , ) be an arbitrary measure space. We introduce the following relation on
the vector space of -measurable real-valued functions dened on T : two functions
are called equivalent if they differ only on a set of -measure zero. Let E be the vector
space of equivalence classes (quotient space).
Let further : R R be a Young function, i.e. an even, convex function with
(0) = 0, continuous at zero and on R
+
continuous from the left (see Section 6.1). It
is easily seen that the set
L
() :=
_
x E
there is > 0 with

_
tT
(x(t))d
_
is a subspace of E (see Section 6.2). By use of the Minkowski functional (see Theo-
rem 3.2.7) the corresponding Luxemburg norm is dened on L
():
|x|
()
:= inf
_
c > 0
_
tT
_
x(t)
c
_
d 1
_
.
In particular we obtain by choosing
: R R as
(t) :=
_
0 for [t[ 1
otherwise
the norm
|x|
= inf
_
c > 0
_
tT
_
x(t)
c
_
d 1
_
,
and the corresponding space L
().
For the Young functions
p
: R R with
p
(t) := [t[
p
one obtains the L
p
()
spaces (p 1).
Furthermore we denote by M
() the closed subspace spanned by the step func-

tions of L
().
The modular f
: L
() R is dened by
f
(x) =
_
tT
(x(t))d.
Remark 1.2.1. The level sets of f
are bounded, because let f
(x) M (w.l.o.g.
M 1), then by use of the convexity of f
,
f
_
x
M
_
= f
_
1
M
x +
_
1
1
M
_
0
_
1
M
f
(x) 1,
hence by denition: |x|
()
M.
In this chapter we will consider approximation problems of the following kind:
Let x L
(). We look for a point in a xed subset K of L
(), for which the

modular f
or the Luxemburg norm | |

()
assumes its least value among all points
u in K.
If K is convex we obtain the following characterization theorem for the modular
approximation:
Theorem 1.2.2. Let (T, , ) be a measure space, a nite Young function and K
a convex subset of L
. An element u
0
K is a minimal solution of the functional f
,
if and only if for all u K and h := u u
0
we have
_
h>0
h
t
+
(u
0
)d +
_
h<0
h
t
(u
0
)d 0.
Proof. Due to the monotonicity of the difference quotient t
(s
0
+ts)(s
0
)
t
for all
s
0
, s R (see Theorem 3.3.1) and the theorem on monotone convergence it follows
that
(f
)
t
+
(x
0
, h) =
_
h>0
h
t
+
(x
0
)d +
_
h<0
h
t
(x
0
)d.
From the Characterization Theorem 3.4.3 we obtain the desired result.
For linear approximations and differentiable Young functions we obtain the follow-
ing characterization:
Corollary 1.2.3. Let (T, , ) be a measure space, a differentiable Young function,
and V a closed subspace of M
. Let x M
() V , then the element v

0
V is a
minimal solution of the functional f
(x ) on V , if for all v V we have

_
T
v
t
(x v
0
)d = 0.
An element v
0
V is best approximation of x w.r.t. V in the Luxemburg norm, if for
all v V we have
_
T
v
t
_
x v
0
|x v
0
|
()
_
d = 0.
Section 1.3 Approximation in C(T) 9
Proof. Set K := x V and u
0
:= x v
0
, then the assertion follows for the mod-
ular approximation. In order to obtain the corresponding statement for the Luxem-
burg norm, let v
0
be a best approximation w.r.t. the Luxemburg norm, set
1
(s) :=
(
s
|xv
0
|
()
), then (see Lemma 6.2.15 together with Remark 6.2.23)
_
T
1
(x v)d =
_
T
_
x v
|x v
0
|
()
_
d
_
T
_
x v
|x v|
()
_
d = 1
=
_
T
1
(x v
0
)d.
i.e. v
0
is a best modular approximation for f
1
of x w.r.t. V .
Conversely let
_
T
v
t
_
x v
0
|x v
0
|
()
_
d = 0
hold, then v
0
is a minimal solution of f
1
(x ) on V , hence
_
T
_
x v
|x v
0
|
()
_
d =
_
T
1
(x v)d
_
T
1
(x v
0
)d = 1
=
_
T
_
x v
|x v|
()
_
d,
and thus |x v|
()
|x v
0
|
()
.
Linear approximations are used e.g. in parametric regression analysis, in particular
in robust statistics (i.e. if
t
is bounded). An example is the function
(s) = [s[ ln(1 +[s[)
with the derivative
t
(s) =
s
1+[s[
. To the corresponding non-linear system of equa-
tions (see Equation (1.15)) the rapidly convergent algorithms of Chapter 4 can be
applied.
Further applications can be found not only in connection with Polya algorithms for
the computation of best L
1
- and Chebyshev approximations (see Chapter 2), but also
in Greedy algorithms.
1.3 Approximation in C(T)
Let T be a compact metric space. We consider the space C(T) of continuous functions
on T. Based on Borels -algebra (generated from the open sets on T) the elements of
C(T) are measurable functions. If we dene a nite measure on this -algebra then
C(T) can be considered as a subspace of L
().
1.3.1 Chebyshev Approximations in C(T)
In order to prove the criterion of Kolmogoroff for a best Chebyshev approximation
we use the right-sided directional derivative of the maximum norm (which is ob-
tained as an example in Theorem 5.2.4 for the Stability Theorem of Monotone Con-
vergence 5.2.3).
Theorem 1.3.1 (Kolmogoroff criterion). Let V be a convex subset of C(T) and let
x C(T) V , v
0
V and z := x v
0
. Let further
E(z) := t T [ [z(t)[ = |z|
be the set of extreme values z of the difference function.

The element v
0
is a best Chebyshev approximation of x w.r.t. V , if and only if no
v V exists, such that
z(t)(v(t) v
0
(t)) > 0 for all t E(z).
Proof. Let f(x) := |x|
and K := x V . According to Theorem 5.2.4 and the

Characterization Theorem 3.4.3 v
0
is a best Chebyshev approximation of x w.r.t. V if
and only if for all v V we have
f
t
+
(x v
0
, v
0
v) = max
tE(z)
(v
0
(t) v(t)) sign(x(t) v
0
(t)) 0,
from which the assertion follows.
Remark 1.3.2. If V is a subspace one can replace the condition in the previous theo-
rem by: there is no v V with z(t)v(t) > 0 for all t E(z) or more explicitly
v(t) > 0 for t E
+
(z) := t T [ z(t) = |z|
v(t) < 0 for t E
(z) := t T [ z(t) = |z|
.
As a characterization of a best linear Chebyshev approximation w.r.t. a nite-di-
mensional subspace of C(T) we obtain
Theorem 1.3.3. Let V = spanv
1
, . . . , v
n
be a subspace of C(T) and let x
C(T) V . The element v
0
V is a best Chebyshev approximation of x w.r.t. V ,
if and only if there are k points t
1
, . . . , t
k
E(x v
0
) with 1 k n + 1 and k
positive numbers
1
, . . . ,
k
, such that for all v V we have
0 =
k
j=1
j
(x(t
j
) v
0
(t
j
)) v(t
j
).
Proof. Let the above condition be satised and let
k
j=1
j
= 1. Set z := x v
0
.
From z(t
j
) = |z|
for j = 1, . . . , k it follows that

|z|
2
=
k
j=1
j
z
2
(t
j
) =
k
j=1
j
z(t
j
)(z(t
j
) v(t
j
))
|z|
j=1
j
max
j
[z(t
j
) v(t
j
)[ |z|
|z v|
,
and hence |z|
|z v|
= |x v
0
v|
. As V is a subspace, it follows that

|x v
0
|
|x v|
for all v V .
Let now v
0
be a best Chebyshev approximation of x w.r.t. V . We consider the
mapping A : T R
n
, which is dened by A(t) := z(t)(v
1
(t), . . . , v
n
(t)). Let
C := A(E(z)). We rst show: 0 conv(C): as a continuous image of the compact
set E(z) the set C is also compact. The convex hull of C is according to the theorem
of Carathodory (see Theorem 3.1.19) the image of the compact set
C
n+1
_
R
n+1
n+1
j=1
j
= 1,
j
0, j = 1, . . . , n + 1
_
under the continuous mapping (c
1
, . . . , c
n+1
, )
n+1
j=1

j
c
j
and hence compact.
Suppose 0 / conv(C), then due to the Strict Separation Theorem 3.9.17 there exists
a (a
1
, . . . , a
n
) R
n
such that for all t E(z) we have
n
i=1
a
i
z(t)v
i
(t) > 0.
For v
:=
n
i=1
a
i
v
i
and t E(z) it follows that z(t)v
(t) > 0, contradicting the

Kolmogoroff criterion, i.e. 0 conv(C). Due to the theorem of Carathodory (see
Theorem 3.1.19) there exist k numbers
1
, . . . ,
k
0 with 1 k n + 1 and
k
j=1
j
= 1, and k points c
1
, . . . , c
k
C, thus k points t
1
, . . . , t
k
E(z), such that
0 =
k
j=1
j
c
j
=
k
j=1
j
(x(t
j
) v
0
(t
j
))
_
_
_
_
_
v
1
(t
j
)
v
2
(t
j
)
.
.
.
v
n
(t
j
)
_
_
_
_
_
.
As v
1
, . . . , v
n
form a basis of V , the assertion follows.
Remark 1.3.4. Using the point measures
t
j
, j = 1, . . . , k and the measure :=
k
j=1
t
j
the above condition can also be written as
_
T
(x(t) v
0
(t))v(t)d(t) = 0.
As a consequence of the above theorem we obtain
Theorem 1.3.5 (Theorem of de la Valle-Poussin I). Let V be an n-dimensional sub-
space of C(T), let x C(T) V , and let v
0
be a best Chebyshev approximation of x
w.r.t. V . Then there is a subset T
0
of E(xv
0
), containing not more than n+1 points
and for which the restriction v
0
[
T
0
is a best approximation of x[
T
0
w.r.t. V [
T
0
, i.e. for
all v V we have
max
tT
[x(t) v
0
(t)[ = max
tT
0
[x(t) v
0
(t)[ max
tT
0
[x(t) v(t)[.
In the subsequent discussion we want to consider L
-approximations in C[a, b].

We achieve particularly far-reaching assertions (uniqueness, behavior of zeros, error
estimates) in the case of Haar subspaces.
1.3.2 Approximation in Haar Subspaces
Denition 1.3.6. Let T be a set. An n-dimensional subspace V = spanv
1
, . . . , v
n
of the vector space X of the real functions on T is called a Haar subspace, if every
not identically vanishing function v V has at most n 1 zeros in T.
Equivalent statements to the one in the above denition are
(a) For all t
1
, . . . , t
n
T with t
i
,= t
j
for i ,= j we have
v
1
(t
1
) . . . v
1
(t
n
)
.
.
.
.
.
.
.
.
.
v
n
(t
1
) . . . v
n
(t
n
)
,= 0.
(b) To n arbitrary pairwise different points t
i
T, i = 1, . . . , n, there is for any set
of values s
i
, i = 1, . . . , n, a unique v V with v(t
i
) = s
i
for i = 1, . . . , n, i.e.
the interpolation problem for n different points is uniquely solvable.
Proof. From the denition (a) follows, because if det(v
i
(t
j
)) = 0 there is a non-
trivial linear combination

n
i=1
a
i
z
i
of the rows of the matrix (v
i
(t
j
)) yielding the
zero vector. Hence the function
n
i=1
a
i
v
i
has t
1
, . . . , t
n
as zeros.
Conversely from (a) the denition follows, because let t
1
, . . . , t
n
be zeros of a
v V with v ,= 0, hence v =
n
i=1
a
i
v
i
in a non-trivial representation, then we ob-
tain for these t
1
, . . . , t
n
a non-trivial representation of the zero-vector:
n
i=1
a
i
z
i
=
0, a contradiction.
However, (a) and (b) are equivalent, because the linear system of equations
n
j=1
a
j
v
j
(t
i
) = s
i
, i = 1, . . . , n,
has a unique solution, if and only if (a) holds.
Examples for Haar subspaces:
(a) The algebraic polynomials of degree at most n forma Haar subspace of dimension
n + 1 on every real interval [a, b].
(b) The trigonometric polynomials of degree at most n form a Haar subspace of di-
mension 2n + 1 on the interval [0, 2].
(c) Let
1
, . . . ,
n
be pairwise different real numbers, then V :=spane
1
t
, . . . , e
n
t
is a Haar subspace of dimension n on every interval [a, b].

In the sequel let V be an n-dimensional Haar subspace of C[a, b].
Denition 1.3.7. The zero t
0
of a v V is called double, if
(a) t
0
is in the interior of [a, b]
(b) v is in a neighborhood of t
0
non-negative or non-positive.
In any other case the zero t
0
is called simple.
Lemma 1.3.8. Every not identically vanishing function v V has taking the mul-
tiplicity into account at most n 1 zeros.
Proof (see [112]). Let t
1
< < t
m
be the zeros of v on (a, b), let r be the number
of zeros at the boundary of the interval and let k be the number of double zeros of v,
then v has according to the denition m k + r simple zeros and we have:
r +m n 1. Let further be t
0
:= a and t
m+1
:= b, then we dene
:= min
0jm
max
t[t
j
,t
j+1
]
[v(t)[.
Apparently > 0. Furthermore, there is a v
1
V that assumes the value 0 in each
simple zero of v and in each double zero the value 1 (or 1 respectively), provided
that v is in a neighborhood of this zero non-negative (or non-positive respectively).
Let c > 0 be chosen in such a way that
c max
t[a,b]
[v
1
(t)[ < ,
then the function v
2
:= v c v
1
has the following properties:
(a) each simple zero of v is a zero of v
2
(b) each double zero of v generates two zeros of v
2
: to see this let

t
j
[t
j
, t
j+1
] be
chosen, such that
max
t[t
j
,t
j+1
]
[v(t)[ = [v(
t
j
)[
for j = 0, 1, . . . , m, then [v(
t
j
)[ .
Let now t
j
be a double zero of v and let v
1
(t
j
) = 1, then we have
v
2
(t
j
) = v(t
j
) c v
1
(t
j
) = c
v
2
(
t
j
) = v(
t
j
) c v
1
(
t
j
) c [v
1
(
t
j
)[ > 0
v
2
(
t
j+1
) = v(
t
j+1
) c v
1
(
t
j+1
) c [v
1
(
t
j+1
)[ > 0.
A corresponding result is obtained for v
1
(t
j
) = 1.
For sufciently small c all these zeros of v
2
are different. Thus we nally obtain
r +mk + 2k n 1.
Lemma 1.3.9. Let k < n and t
1
, . . . , t
k
(a, b). Then there is a v V which has
precisely in these points a zero with a change of sign.
Proof. If k = n 1, then according to the previous lemma the interpolation
problem v(a) = 1 and v(t
i
) = 0 for i = 1, . . . , n 1 has the desired property,
because none of these zeros of v can be a double zero.
For k < n 1 we construct a sequence of points (t
(m)
k+1
, . . . , t
(m)
n1
) with pairwise
different components in the open cuboid (t
k
, b)
nk1
that converges component-
wise to b. In an analogous way as in the case k = n 1 we now determine for
the points a, t
1
, . . . , t
k
, t
(m)
k+1
, . . . , t
(m)
n1
the solution v
(m)
=
n
i=1
a
(m)
i
v
i
of the in-
terpolation problem v(a) = 1, v(t
i
) = 0 for i = 1, . . . , k and v(t
(m)
i
) = 0 for
i = k + 1, . . . , n 1.
Let a := ( a
1
, . . . , a
n
) be a point of accumulation of the sequence
a
(m)
|a
(m)
|
, where
a
(m)
:= (a
(m)
1
, . . . , a
(m)
n
). Then v :=
n
i=1
a
i
v
i
satises the requirements.
Chebyshevs Alternant
We start our discussion with
Theorem 1.3.10 (Theorem of de la Valle-Poussin II). Let V be an n-dimensional
Haar subspace of C[a, b] and x C[a, b] V . If for a v
0
V and for the points
t
1
< < t
n+1
in [a, b] the condition
sign(x v
0
)(t
i
) = sign(x v
0
)(t
i+1
) for 1, . . . , n
is satised, the estimate
|x v
0
|
min
vV
|x v|
min
1in+1
[x(t
i
) v
0
(t
i
)[
follows.
Proof. Suppose for a v
V we have
|x v
< min
1in+1
[x(t
i
) v
0
(t
i
)[,
then 0 ,= v
v
0
= (xv
0
)(xv
) and sign(v
v
0
)(t
i
) = sign(v
v
0
)(t
i+1
) for
i 1, . . . , n follows. Thus v
v
0
has at least n zeros in [a, b], a contradiction.
Theorem 1.3.11 (Theorem of Chebyshev). Let V be an n-dimensional Haar sub-
space of C[a, b] and x C[a, b] V . Then v
0
V is a best Chebyshev approximation
of x w.r.t. V , if and only if there are n + 1 points t
1
< < t
n+1
in [a, b] satisfying
the alternant condition, i.e. [x(t
i
) v
0
(t
i
)[ = |x v
0
|
and
sign(x v
0
)(t
i
) = sign(x v
0
)(t
i+1
)
for i 1, . . . , n.
Proof. Let v
0
be a best Chebyshev approximation. Due to the Characterization Theo-
rem there are k n+1 points t
1
, . . . , t
k
=: S E(xv
0
) and k positive numbers
1
, . . . ,
k
such that
k
j=1
j
(x(t
j
) v
0
(t
j
))v(t
j
) = 0 for all v V,
and v
0
[
S
is a best approximation of x[
S
w.r.t. V [
S
. As V is a Haar subspace, k =
n + 1 must hold, because otherwise one could interpolate x[
S
, contradicting the
properties of v
0
. We now choose 0 ,= v V , having zeros in the n 1 points
t
1
, . . . , t
i1
, t
i+2
, . . . , t
n+1
(and no others). Then it follows that
i
(x(t
i
) v
0
(t
i
)) v(t
i
) +
i+1
(x(t
i+1
) v
0
(t
i+1
)) v(t
i+1
) = 0,
and sign v(t
i
) = sign v(t
i+1
), because t
i1
< t
i
< t
i+1
< t
i+2
. Conversely, if the
alternant condition is satised, then due to Theorem II of de la Valle-Poussin v
0
is a
best approximation of x w.r.t. V .
Corollary 1.3.12. Let V be an n-dimensional Haar subspace of C[a, b] and x
C[a, b] V . Then the best Chebyshev approximation v
0
of x w.r.t. V is uniquely
determined and the difference function x v
0
has at least n zeros.
Proof. Let v
1
and v
2
be best Chebyshev approximations of x w.r.t. V . Then v
0
:=
1
2
(v
1
+ v
2
) is also a best approximation and due to the theorem of Chebyshev there
are n + 1 points t
1
< < t
n+1
in [a, b] with [x(t
i
) v
0
(t
i
)[ = |x v
0
|
and
sign(x v
0
)(t
i
) = sign(x v
0
)(t
i+1
) for i 1, . . . , n. Then we obtain
|x v
0
|

1
2
[(x v
1
)(t
i
)[ +
1
2
[(x v
2
)(t
i
)[ [(x v
0
)(t
i
)[ = |x v
0
|
.
It follows that [(x v
j
)(t
i
)[ = |x v
0
|
for i = 1, . . . , n + 1 and j = 1, 2.
Furthermore (x v
1
)(t
i
) = (x v
2
)(t
i
) for i = 1, . . . , n + 1, because if for an
i
0
1, . . . , n + 1
sign((x(t
i
0
) v
1
(t
i
0
)) = sign(x(t
i
0
v
2
(t
i
0
)),
then x(t
i
0
)
1
2
(v
1
(t
i
0
) + v
2
(t
i
0
)) = x(t
i
0
) v
0
(t
i
0
) = 0, a contradiction. But from
(x v
1
)(t
i
) = (x v
2
)(t
i
) for i = 1, . . . , n + 1 it follows that v
2
(t
i
) v
1
(t
i
) = 0
for i = 1, . . . , n + 1, thus v
1
= v
2
= v
0
, because V is a Haar subspace.
Due to sign(x v
0
)(t
i
) = sign(x v
0
)(t
i+1
) for i 1, . . . , n and [x(t
i
)
v
0
(t
i
)[ = |x v
0
|
for i 1, . . . , n + 1 we obtain using the continuity of x v

0
that x v
0
has at least n zeros.
Remark 1.3.13. The set t
1
< < t
n+1
is called Chebyshevs alternant. The
determination of a best approximation can now be reduced to the determination of
this set and one can pursue the following strategy (see Remez algorithm see [22]):
choose n + 1 points and try by exchanging points to achieve improved bounds in
the sense of the theorem of de la Vale-Poussin. This theorem then provides a stop
criterion.
Example 1.3.14 (Chebyshev Polynomials). Let P
n
(t) := cos(narccos t) on [1, 1].
P
n
is a polynomial of degree n, because from the addition theorem cos + cos =
2 cos
1
2
( + ) cos
1
2
( ) the recursion formula P
n+1
(t) = 2tP
n
(t) P
n1
(t)
follows, where P
0
(t) = 1 and P
1
(t) = t for t [1, 1]. For t
i
:= cos
i
n
we have
P
n
(t
i
) = cos i = (1)
i
for i = 0, 1, . . . , n. If we dene
T
n
(t) :=
1
2
n1
P
n
(t),
then the leading coefcient of T
n
is equal to 1 and we have T
n
(t
i
) =
1
2
n1
(1)
i
,
i.e. t
0
, . . . , t
n
form a Chebyshevs alternant for the approximation of x(t) = t
n
by
the subspace of the polynomials of degree n 1 on [1, 1]. The difference function
x v
0
= T
n
has n zeros.
Uniqueness: The Theorem of Jackson in L
According to Jackson an alternative holds for the L

1
approximation which is general-
ized by the following theorem (see [51]).
Theorem 1.3.15 (Jacksons Alternative). Let be a nite, denite Young function
and let v
0
be a best f
or | |
()
approximation of x C[a, b] V . Then x v
0
has
at least n zeros or the measure of the zeros is positive.
If is differentiable at 0, then x v
0
has at least n zeros with change of sign.
Proof. Let at rst v
0
be a best f
approximation and let Z be the set of zeros of xv

0
.
If (Z) = 0 then according to Theorem 1.2.2 for all h V
_
h>0\Z
h
t
+
(x v
0
)d +
_
h<0\Z
h
t
(x v
0
)d 0. (1.5)
If x v
0
has only k < n zeros with change of sign, say t
1
, . . . , t
k
, we can due to
Lemma 1.3.9 nd a v
1
V such that
sign v
1
(t) = sign(x(t) v
0
(t)) for all t [a, b] Z.
As is symmetrical and denite, we have for s R 0
sign
t
+
(s) = sign
t
(s) = sign
t
+
(s) = sign
t
(s).
Hence for h = v
1
both integrands in (1.5) are negative, a contradiction.
If is differentiable at 0, then
t
(0) = 0 and hence also
_
Z
h
t
(x v
0
)d = 0.
As x ,= v
0
and the measure is equivalent to the Lebesgue measure, we can repeat the
argument used in the rst part of the proof.
Let now v
0
be a best | |
()
approximation of x C[a, b] V , and let c :=
|x v
0
|
()
. For
1
(s) := (
s
c
) apparently v
0
is a best f
1
approximation of x,
because for all v V we have
1 =
_
b
a
_
x v
0
c
_
d
_
b
a
_
x v
c
_
d.
Repeating the above argument for
1
, we obtain the corresponding property for the
best approximation in the Luxemburg norm.
The uniqueness theorem of Jackson for the L
1
-approximation (see [1]) can be gen-
eralized to the best L
approximation under the following condition:

Theorem 1.3.16. Let V be a nite-dimensional Haar subspace of C[a, b] and let
be a nite, denite Young function. Then every x C[a, b] V has a unique best f
and | |
()
approximation.
Proof. Let v
1
, v
2
be best f
approximations of x w.r.t. V , then also v

0
:=
1
2
(v
1
+v
2
),
and we have
0 =
1
2
_
b
a
(x v
1
)d +
1
2
_
b
a
(x v
2
)d
_
b
a
(x v
0
)d
=
_
b
a
_
1
2
(x v
1
) +
1
2
(x v
2
)d (x v
0
)
_
d.
As is convex, the integrand on the right-hand side is non-negative. As x v
0
is
continuous we have for all t [a, b]
1
2
(x v
1
)(t) +
1
2
(x v
2
)(t) (x v
0
)(t) = 0.
Due to Jacksons Alternative 1.3.15 xv
0
has at least n zeros. As is only zero at 0,
the functions x v
1
and x v
2
and hence v
1
v
2
have at least the same zeros. As V
is a Haar subspace, v
1
= v
2
follows. In order to obtain uniqueness for the Luxemburg
norm, we choose as we have done above c := |x v
0
|
()
and
1
(s) := (
s
c
).
The following generalization of the theorem of Bernstein provides an error estimate
for linear approximations in C[a, b] w.r.t. the (n + 1)-dimensional Haar subspace of
the polynomials of degree n:
Error Estimate: The Theorem of Bernstein in L
Lemma 1.3.17. Let the functions x, y C[a, b] have derivatives up to the order n+1
on [a, b] and let for the (n + 1)-st derivatives x
(n+1)
and y
(n+1)
satisfy
[x
(n+1)
(t)[ < y
(n+1)
(t) for all t [a, b]. (1.6)
Then the following inequality holds:
[r
x
(t)[ [r
y
(t)[ for all t [a, b], (1.7)
where r
x
:= x x
n
and r
y
:= y y
n
denote the difference functions for the inter-
polation polynomials x
n
and y
n
w.r.t. x and y resp. for the same interpolation nodes
t
0
< t
1
< < t
n
in [a, b].
Proof. Following the arguments of Tsenov [106]: for t = t
i
, i = 0, 1, . . . , n, the
inequality is trivially satised. Let now t ,= t
i
, i = 0, 1, . . . , n. The auxiliary function
z(s) :=
x(t) x
n
(t) x(s) x
n
(s)
y(t) y
n
(t) y(s) y
n
(s)
has zeros at s = t and at the n + 1 interpolation nodes. Hence, due to the theorem of
Rolle there is s
0
[a, b], where the (n + 1)-st derivative
z
(n+1)
(s) :=
x(t) x
n
(t) x
(n+1)
(s)
y(t) y
n
(t) y
(n+1)
(s)
is equal to 0, i.e.
x
(n+1)
(s
0
)r
y
(t) = y
(n+1)
(s
0
)r
x
(t).
If now x
(n+1)
(s
0
) = 0, then r
x
(t) = 0, thus the Inequality (1.7) is trivially satised.
If x
(n+1)
(s
0
) ,= 0, then using (1.6) the assertion follows.
In the sequel we will present a generalization of the subsequent theorem of S. N.
Bernstein.
Theorem1.3.18 (Theoremof Bernstein). Let the functions x, y C[a, b] have deriva-
tives of order n+1 in [a, b] and let the (n+1)-st derivatives x
(n+1)
and y
(n+1)
satisfy
[x
(n+1)
(t)[ y
(n+1)
(t) for all t [a, b].
The following inequality holds:
E
n
(x) E
n
(y),
where E
n
(z) denotes the distance of z to the space of polynomials of degree n w.r.t.
the maximum norm.
Denition 1.3.19. A functional f : C[a, b] R is called monotone, if x, y C[a, b]
and [x(t)[ [y(t)[ for all t [a, b] implies f(x) f(y).
Theorem 1.3.20. Let f : C[a, b] R be monotone. Let the functions x, y C[a, b]
have derivatives up to order n + 1 in [a, b] and let the (n + 1)-st derivatives x
(n+1)
and y
(n+1)
satisfy
[x
(n+1)
(t)[ < y
(n+1)
(t) for all t [a, b].
Let further V be the subspace of the polynomials of degree n and let for v
x
, v
y
V
(a) f(x v
x
) = inff(x v) [ v V and f(y v
y
) = inff(y v) [ v V
(b) the functions x v
x
and y v
y
have at least n + 1 zeros in [a, b].
Then the inequality f(x v
x
) f(x v
y
) holds.
Proof. If we choose n+1 zeros of y v
y
as interpolation nodes and denote by v
x
the
interpolation polynomial belonging to x and these interpolation nodes , then, due to
the previous lemma [x(t)v
x
(t)[ [y(t)v
y
(t)[ for all t [a, b]. The monotonicity
of f then implies
f(x v
x
) f(x v
x
) f(y v
y
).
Remark 1.3.21. The modular f
as well as the Luxemburg norm | |

()
satisfy the
monotonicity requirement of the previous theorem.
Moreover, Jacksons Alternative (see Theorem 1.3.15) provides for nite and
denite corresponding information about the set of zeros of the difference function.
Corollary 1.3.22. Let f be the Luxemburg norm and V be as in the previous theorem
and let y(t) = t
n+1
and let E
f
n
(z) := inf
vV
f(z v). If for x C[a, b] we have
[x
(n+1)
(t)[ < for all t [a, b],
then the following inequality holds:
E
f
n
(x)
E
f
n
(y)
(n + 1)!
.
Corollary 1.3.23. Let V be the subspace of polynomials of degree n on [1, 1], let
y(t) = t
n+1
, and let E
n
(z) := inf
vV
|z v|
. If for x C[a, b]
[x
(n+1)
(t)[ < for all t [1, 1]
then the following inequality holds:
E
n
(x)
E
n
(y)
(n + 1)!
=

2
n
(n + 1)!
,
because E
n
(y) = |T
n+1
|
=
1
2
n
(see Example 1.3.14).
1.4 Discrete L
-approximations
Let T be a set containing m elements, T = t
1
, . . . , t
m
and x : T R. Hence
we can view x as a point in R
m
and consider the approximation problem in R
m
. The
norms that are used in this section can be understood as norms on the R
m
.
For a given point x in R
m
we try to nd a point in a certain subset M of R
m
that
among all points of M has the least distance to x, where the distance is understood in
the sense of the (discrete) Luxemburg norm
|x y|
()
= inf
_
c > 0
tT
_
x(t) y(t)
c
_
1
_
.
In the subsequent discussion we will rst consider the special case of the discrete
Chebyshev approximation which frequently results from the discretization of a con-
tinuous problem. On the other hand, due to the Theorems of de la Valle-Poussin (see
above) the continuous problem can always be viewed as a discrete approximation.
1.4.1 Discrete Chebyshev Approximation
For a given point x in R
m
we look for a point in a subset M of R
m
that has the least
distance from x where the distance is dened as
|x y|
= max
tT
[x(t) y(t)[. (1.8)
Section 1.4 Discrete L
-approximations 21
The Strict Approximation
J. R. Rice has introduced the strict approximation using the notion of critical points
(see Rice [98]). In the sequel the strict approximation is dened using the universal
extreme point (see also Descloux [25]).
Let T = t
1
, . . . , t
m
, V be a subspace of C(T) = R
m
and x C(T) V . Let
P
V
(x) denote the set of all best approximations of x w.r.t. V .
Denition 1.4.1. A point t T is called extreme point of a best approximation v
0
of x w.r.t. V , if
[x(t) v
0
(t)[ = |x v
0
|
.
By E
T
v
0
we denote the set of all extreme points of a best approximation v
0
of x.
Lemma 1.4.2. The set
D =
v
0
P
V
(x)
E
T
v
0
of the extreme points common to all best approximations (we will call these points
universal) is non-empty.
Proof. Suppose D = , then we could for each i, i = 1, . . . , m, nd a v
i
P
V
(x)
such that
[v
i
(t
i
) x(t
i
)[ < min
vV
|v x|
.
But then the function
1
m
m
i=1
v
i
V
would approximate the function x better than v
i
(i = 1, 2, . . . , m). As, however
v
i
P
V
(x) this would mean a contradiction.
Let x / V . A universal extreme point t D is called negative (or positive resp.)
w.r.t. a best approximation v
0
, if
sign(x(t) v
0
(t)) = 1 (resp. + 1).
Lemma 1.4.3. Let x / V . If a universal extreme point is negative (or positive) w.r.t.
a best approximation, then it is negative (or positive) w.r.t. every best approximation.
Therefore it is simply called negative (or positive).
Proof. If for v
1
, v
2
P
V
(x) and a t D
x(t) v
1
(t) = x(t) +v
2
(t),
then for
1
2
(v
1
+v
2
) P
V
(x)
x(t)
1
2
(v
1
(t) +v
2
(t)) = 0,
contrary to t being a universal extreme point.
Let nowV be an n-dimensional subspace (n <m) of C(T). Every basis u
1
, . . . , u
n
of the subspace V denes a mapping B : T R
n
,
B(t) =
_
_
_
_
_
u
1
(t)
u
2
(t)
.
.
.
u
n
(t)
_
_
_
_
_
. (1.9)
For every set H R
n
let [H] denote the smallest subspace of R
n
, containing H.
Lemma 1.4.4. Let P T, Q = B
1
([B(P)]), v
1
, v
2
P
V
(x) and
v
1
(t) = v
2
(t) for t P,
then also
v
1
(t) = v
2
(t) for t Q.
Proof. Let v
1
and v
2
be represented in the above basis:
v
k
=
n
i=1
a
(k)
i
u
i
, k = 1, 2, := [P[ and w.l.o.g.
P = t
1
, t
2
, . . . , t
.
For t Q we have
B(t) =
_
_
_
_
_
u
1
(t)
u
2
(t)
.
.
.
u
n
(t)
_
_
_
_
_
=
j=1
c
j

_
_
_
_
_
u
1
(t
j
)
u
2
(t
j
)
.
.
.
u
n
(t
j
)
_
_
_
_
_
and hence
u
i
(t) =
j=1
c
j
u
i
(t
j
),
-approximations 23
where c
1
, c
2
, . . . , c
are real constants, thus

v
1
(t) =
n
i=1
a
(1)
i
u
i
(t) =
n
i=1
a
(1)
i
j=1
c
j
u
j
(t
j
) =
j=1
c
j
n
i=1
a
(1)
i
u
i
(t
j
)
=
j=1
c
j
n
i=1
a
(2)
i
u
i
(t
j
) =
n
i=1
a
(2)
i
j=1
c
j
u
i
(t
j
) = v
2
(t).
Lemma 1.4.5. Let T
t
be a proper subset of T and q : T
t
R. If there is a best
approximation of x which extends q, then among these extensions there is at least
one, which approximates x best on T T
t
. Thus, if
W = v
0
P
V
(x) [ v
0
[
T
= q ,= ,
then also
Z = v
1
W [ |v
1
x|
T\T
|v
0
x|
T\T
for all v
0
W ,= .
Furthermore
v
1
Z
E
T\T
v
1
,= .
Proof. The function N : W R with v
0
|v
0
x|
T\T
assumes its minimum on
the convex and compact set W, hence Z ,= . The proof of the second part can be
performed in analogy to the proof of Lemma 1.4.2, because W is convex.
The strict approximation is characterized by the following constructive denition:
Let
V
0
= P
V
(x) = v
0
V [ d
0
= |v
0
x|
|v x|
for all v V
be the set of all best Chebyshev approximations of x w.r.t. V and let
D
0
=
v
0
V
0
E
T
v
0
be the set of the universal extreme points (see Lemma 1.4.2) and
T
0
= t T [ t B
1
([B(D
0
)]).
All best approximations v
0
of x are identical in all points of T
0
(see Lemma 1.4.4).
Among the v
0
P
V
(x) we now look for those, which are best approximations of x
on T T
0
V
1
= v
1
V
0
[ d
1
= |v
1
x|
T\T
0
|v
0
x|
T\T
0
for all v
0
V
0
,
D
1
=
v
1
V
1
E
T\T
0
v
1
(see Lemma 1.4.5)
and
T
1
= t T [ t B
1
([B(D
0
D
1
)]).
Furthermore we have
V
2
= v
2
V
1
[ d
2
= |v
2
x|
T\T
1
|v
1
x|
T\T
1
for all v
1
V
1
,
D
2
=
v
2
V
2
E
T\T
1
v
2
and
T
2
= t T [ t B
1
([B(D
0
D
1
D
2
)]).
We observe: if T T
i
is non-empty, T
i
is a proper subset of T
i+1
: in fact D
i+1
T
i+1
,
but D
i+1
T
i
= . The procedure is continued until T
k
= T. From the construction,
using Lemma 1.4.5, it becomes apparent that for v
(1)
i
, v
(2)
i
V
i
we obtain: v
(1)
i
[
T
i
=
v
(2)
i
[
T
i
.
Hence V
k
consists of exactly one element, and this element we call the strict ap-
proximation of x w.r.t. V .
1.4.2 Linear L
-approximation for Finite Young Functions

Uniqueness and Differentiability
Before we turn our attention towards the numerical computation, we want to discuss
a few fundamental questions:
(a) How can the uniqueness of minimal solutions w.r.t. | |
()
be described?
(b) What can be said about the differentiation properties of the norm | |
()
?
Happily, these properties can be easily gleaned from the Young functions:
Strict convexity and differentiability of the Young functions carry over to strict
convexity and differentiability of the corresponding norms. The subsequent theorem
gives a precise formulation of this statement, which we will in a more general and
enhanced form treat in Chapter 8:
Theorem 1.4.6. Let : R R be a Young function.
(a) If is differentiable, the corresponding Luxemburg norm is differentiable and
we have
|x|
()
=
t
(
x()
|x|
()
)
tT
x(t)
|x|
()
t
(
x(t)
|x|
()
)
. (1.10)
(b) If is strictly convex, so is the corresponding Luxemburg norm.
-approximations 25
Proof. For the function F : R
m
(0, ) R with
(x, c) F(x, c) :=
tT
_
x(t)
c
_
1
we have according to the denition of the Luxemburg norm
F(x, |x|
()
) = 0.
According to the implicit function theorem we only need to show the regularity of
the partial derivative w.r.t. c in order to establish the differentiability of the norm. We
have
c
F(x, c) =
tT
x(t)
c
2

t
_
x(t)
c
_
=
1
c
tT
x(t)
c

t
_
x(t)
c
_
.
Due to the subgradient inequality (see Inequality (3.3)) we have for arbitrary t R
t
(t)(0 t) (0) (t),
i.e.
(t)
t
(t)t. (1.11)
As
tT
(
x(t)
|x|
()
) = 1 we obtain
tT
x(t)
c

t
_
x(t)
c
_
1 for c = |x|
()
. (1.12)
Thus the regularity has been established and for the derivative we obtain using the
chain rule
0 =
d
dx
F(x, |x|
()
) =

x
F(x, c) +

c
F(x, c)|x|
()
,
hence
1
c
_
tT
x(t)
c

t
_
x(t)
c
__
|x|
()
+
t
_
x()
|x|
()
_
1
c
= 0
for c = |x|
()
and hence the rst assertion.
In order to show the second part of the theorem we choose x, y R
m
with
|x|
()
= |y|
()
= 1 and x ,= y. Then
tT
(x(t)) = 1 and
tT
(y(t)) = 1
and hence
1 =
tT
_
1
2
(x(t)) +
1
2
(y(t))
_
>
tT
_
x(t) +y(t)
2
_
,
because due to the strict convexity of the strict inequality must hold for at least one
of the terms in the sum. Thus |x +y|
()
< 2.
Approximation in the Luxemburg Norm
Let now be a twice continuously differentiable Young function. The set w.r.t. which
we approximate is chosen to be an n-dimensional subspace V of R
m
with a given
basis v
1
, . . . , v
n
(linear approximation).
The best approximation y
to be determined is represented as a linear combination
n
i=1
a
i
v
i
of the basis vectors, where a
1
, . . . , a
n
denotes the real coefcients to be
computed. The determination of the best approximation corresponds to a minimiza-
tion problem in R
n
for the unknown vector a = (a
1
, . . . , a
n
). The function to be
minimized in this context is then:
p : R
n
R
a p(a) :=
_
_
_
_
x(t)
tT
a
i
v
i
(t)
_
_
_
_
()
.
This problem we denote as a (linear) approximation w.r.t. the Luxemburg norm.
System of Equations for the Coefcients
For a given we choose for reasons of conciseness the following abbreviations:
z(a, t) :=
x(t)
n
l=1
a
l
v
l
(t)
|x
n
l=1
a
l
v
l
|
()
(a) :=
tT
z(a, t)
t
(z(a, t)).
The number (a) is because of Inequality (1.11) always greater or equal to 1. Setting
the gradient of the function p to zero leads to the following system of equations:
p(a)
i
=
1
(a)
tT
v
i
(t)
t
(z(a, t)) = 0, for i = 1, . . . , n.
For the solution of this system of non-linear equations the methods in Chapter 4 can
be applied. For the elements of the second derivative matrix (Hessian) of p one ob-
tains by differentiation of the gradient using the chain rule ((a) := ((a) |x
n
l=1
a
l
v
l
|
()
)
1
)
2
a
i
a
j
p(a)=(a)
tT
(v
i
(t)+p(a)
i
z(a, t))(v
j
(t)+p(a)
j
z(a, t))
tt
(z(a, t)).
Regularity of the Second Derivative
In order to obtain rapid convergence of numerical algorithms for computing the above
best approximation, the regularity of the second derivative (Hessian) at the solution is
-approximations 27
required. The richness of the available Young functions will enable us to guarantee
this property. At the solution the gradient is equal to zero. Hence here the second
derivative has the following structure:
2
a
i
a
j
p(a) =
1
(a) |x
n
l=1
a
l
v
l
|
()
tT
v
i
(t)v
j
(t)
tt
(z(a, t)).
It turns out that it is sufcient to choose in such a way that
tt
has no zeros, in
order to guarantee positive deniteness of the second derivative of p everywhere, be-
cause the second derivative is a positive multiple of Grams matrix of the basis vectors
v
1
, . . . , v
n
w.r.t. the scalar product
u, w
z
:=
tT
u(t)w(t)
tt
(z(a, t)).
In particular the unique solvability of the linear systems, occurring during the
(damped) Newton method, is guaranteed.
Remark. This property is not provided for the l
p
norms with p > 2.
In the expression for the gradient as well as in the Hessian of the Luxemburg norm
the Luxemburg norm of the difference function occurs |x
n
l=1
a
l
v
l
|
()
. The com-
putation of this quantity can be performed as the one-dimensional computation of a
zero of the function g:
c g(c) :=
tT
_
x(t)
n
i=1
a
i
v
i
(t)
c
_
1.
Instead of decoupling the one-dimensional computation of the zero in every itera-
tion step for the determination of the coefcient vector a, we can by introducing the
additional unknown a
n+1
:= |x
n
l=1
a
l
v
l
|
()
, consider the following system of
equations for the unknown coefcient vector a := (a
1
, . . . , a
n
, a
n+1
) R
n+1
:
F
i
(a) :=
tT
v
i
(t)
t
_
x(t)
n
j=1
a
j
v
j
(t)
a
n+1
_
= 0, for i = 1, . . . , n (1.13)
F
n+1
(a) :=
tT
_
x(t)
n
j=1
a
j
v
j
(t)
a
n+1
_
1 = 0. (1.14)
For the Jacobian we obtain
F
i,j
(a) =
tT
v
i
(t)v
j
(t)
tt
_
x(t)
n
j=1
a
j
v
j
(t)
a
n+1
_
for i = 1, . . . , n and j = 1, . . . , n
F
n+1,j
(a) =
1
a
n+1
tT
v
j
(t)
t
_
x(t)
n
j=1
a
j
v
j
(t)
a
n+1
_
=
1
a
n+1
F
j
(a)
for j = 1, . . . , n
F
n+1,n+1
(a) =
1
a
n+1
tT
x(t)
n
j=1
a
j
v
j
(t)
a
n+1
t
_
x(t)
n
j=1
a
j
v
j
(t)
a
n+1
_
.
One obtains the regularity of the corresponding Jacobian at the solution for with
tt
> 0 in the following way: if we omit in the Jacobian the last row and the last
column, the resulting matrix is apparently non-singular. The rst n elements of the
last row are equal to zero, because F
i
(a) = 0 for i = 1, . . . , n at the solution, and the
(n + 1)-st element is different from 0, because we have t
t
(t) (t) (see above),
hence:
tT
x(t)
n
j=1
a
j
v
j
(t)
a
n+1
t
_
x(t)
n
j=1
a
j
v
j
(t)
a
n+1
_
tT
_
x(t)
n
j=1
a
j
v
j
(t)
a
n+1
_
= 1.
If the determinant of the total Jacobi matrix is expanded using the Laplace expansion
w.r.t. the last row, then apparently the determinant is non-zero.
1.5 Determination of the Linear L
-approximation
In the sequel we will generalize our point of view to arbitrary nite measures.
1.5.1 System of Equations for the Coefcients
Let be a differentiable Young function and let V be an n-dimensional subspace of
M
with a given basis v

1
, . . . , v
n
. From the characterization in Corollary 1.2.3 we
obtain the following system of equations:
_
T
v
i
t
_
x
n
j=1
a
j
v
j
_
d = 0 for i = 1, . . . , n (1.15)
Section 1.5 Determination of the Linear L
-approximation 29
for the modular approximation and
_
T
v
i
t
_
x
n
j=1
a
j
v
j
|x
n
j=1
a
j
v
j
|
()
_
d = 0 for i = 1, . . . , n
for the approximation in the Luxemburg norm. If we introduce as in the case of the
discrete approximation a
n+1
:= |x
n
j=1
a
j
v
j
|
()
as a further variable, we obtain
the additional equation
_
T
_
x
n
j=1
a
j
v
j
a
n+1
_
d 1 = 0.
The statements about the regularity of the Jacobian for twice continuously differen-
tiable Young functions carry over. Thus the methods in Chapter 4 are applicable (if
we are able to compute the integrals). Right now, however, we want to describe a
different iterative method, which already makes use of certain stability principles.
1.5.2 The Method of Karlovitz
Karlovitz presented in [47] a method for the computation of best linear L
p
-approxima-
tions by iterated computation of weighted L
2
-approximations. This algorithm carries
over to the determination of minimal solutions for modulars and Luxemburg norms,
in the discrete as well as in the continuous case:
Let be a strictly convex and differentiable Young function, whose second deriva-
tive exists at 0 and is positive there. By F : R R with F(s) :=

(s)
s
we then dene
a continuous positive function.
Let T, , be a nite measure space and V a nite-dimensional subspace of L
()
and w L
().
For the computation of a best approximation of w w.r.t. V in the sense of the mod-
ular f
and the Luxemburg norm we attempt a unied treatment. By z(v) we denote

in the case of minimization of the modular the vector w v, in the other case the
normalized vector
wv
|wv|
()
.
Theorem 1.5.1. Let v
1
V , then we determine v
k+1
for given v
k
in the following
way: let g
k
:= F(z(v
k
)) and let the scalar product with the weight function g
k
for
x, y C(T)) be given by
(x, y)
g
k
:=
_
T
g
k
xyd
then we dene u
k
as the best approximation of w w.r.t. V in the corresponding scalar
product norm: u
k
:= M(|w |
g
k
, V ). If we have determined the parameter
k
in
such a way that for the modular
f
(w v
k
+
k
(v
k
u
k
)) f
(w v
k
+(v
k
u
k
))
or for the Luxemburg norm
|w v
k
+
k
(v
k
u
k
)|
()
|w v
k
+(v
k
u
k
)|
()
for all R, then we dene v
k+1
:= v
k

k
(v
k
u
k
).
We nally obtain
(a) lim
k
v
k
= lim
k
u
k
= v
, where v
is the uniquely determined minimal

solution of f
(w ) or |w |
()
respectively.
(b) if v
k
,= v
, then f
(wv
k+1
) < f
(wv
k
) resp. |wv
k+1
|
()
< |wv
k
|
()
.
For the proof we need the following lemma, where for f
and | |
()
we choose
the common notation h:
Lemma 1.5.2. Let v
0
V and let u
0
be best approximation of w w.r.t. V in the norm
| |
g
0
. Then the following statements are equivalent:
(a) v
0
,= u
0
(b) there is a ,= 0 with h(w v
0
+(v
0
u
0
)) < h(w v
0
)
(c) v
0
,= v
.
Proof. Due to the requirements for the bilinear form (, )
g
0
is a scalar product and
hence the best approximation of w w.r.t. V is uniquely determined.
(a) (b): Let
0
be chosen such that for all R:
h(w v
0
+
0
(v
0
u
0
)) h(w v
0
+(v
0
u
0
))
and let h(w v
0
+
0
(v
0
u
0
)) = h(w v
0
). Because of the strict convexity of
the function h is strictly convex on its level sets and hence
0
= 0. The chain rule
then yields:
h
t
(w v
0
), v
0
u
0
= 0,
and hence
_
T
t
(z(v
0
))(v
0
u
0
)d = 0. (1.16)
Furthermore because of (w u
0
, v)
g
0
= 0 for all v V :
|w u
0
|
2
g
0
= (w u
0
, w v
0
)
g
0
=
0
_
T
F(z(v
0
))(w v
0
)(w u
0
)d
=
0
_
T
t
(z(v
0
))(w u
0
)d
with
0
= 1 for h = f
and
0
= |w v
0
|
()
for h = | |
()
. Together with (1.16)
we then obtain
|w u
0
|
2
g
0
=
0
_
T
t
(z(v
0
))(w v
0
)d = |w v
0
|
2
g
0
,
-approximation 31
and thus, because of the uniqueness of the best approximation v
0
= u
0
.
(b) (a) is obvious.
(c) (a): We have v
0
= u
0
if and only if for all v V :
0 = (w v
0
, v)
g
0
=
0
_
T
t
(z(v
0
))vd,
in other words h
t
(w v
0
), v = 0 if and only if v
0
= v
(see Corollary 1.2.3).

Proof of Theorem 1.5.1. The sequence (v
k
) is bounded, because by construction w
v
k
S
h
(h(w v
1
)). Suppose now the sequence (u
k
) is not bounded, then there is
subsequence (v
n
) and (u
n
) and v V such that |u
n
|
n and |v
n
v|
0. The
continuity of F yields the uniform convergence of g
n
= F(z(v
n
)) to g = F(z( v)).
Thus the sequence | |
g
n
converges pointwise to | |
g
on C(T). According to the
stability principle for convex functions (see Remark 5.3.16) also the sequence of the
best approximations (u
n
) converges, a contradiction.
Let now v be a point of accumulation of the sequence (v
k
). Then there are conver-
gent subsequences (v
k
i
) and (u
k
i
) with v
k
i
v and u
k
i
u. As above the stability
principle yields: u is best approximation of w w.r.t. V in the norm | |
g
. Suppose
v ,= u, then according to the lemma above there is ,= 0 with
h(w v + ( v u)) < h(w v).
Because of the continuity of h there is a > 0 and a K N such that
h(w v
k
i
+ (v
k
i
u
k
i
)) < h(w v)
for k
i
> K. Furthermore by construction
h(w v
k
i
+1
) h(w v
k
i
+ (v
k
i
u
k
i
)).
On the other hand we obtain because of the monotonicity of the sequence (h(wv
k
))
the inequality h(w v) h(w v
k
i
+1
), a contradiction. Using the previous lemma
we obtain v = v
.
Remark 1.5.3. If the elements of V have zeros of at most measure zero, we can drop
the requirement
tt
(0) > 0: let z w + V , then z ,= 0, i.e. there is U with
(U) > 0 and z(t) ,= 0 for all t U. Hence F(z(t)) > 0 on U. Let v V arbitrary
and N the set of zeros of v, then we obtain
_
T
v
2
F(z)d
_
U\N
v
2
F(z)d > 0.
Corollary 1.5.4. Let b
1
, . . . , b
n
be a basis of V , let A : R
n
V be dened by
Ax =
n
i=1
x
i
b
i
and f : R
n
R by f(x) = h(w Ax). Let x
k
:= A
1
v
k
and y
k
:= A
1
u
k
. Then iterations dened in the previous theorem carry over to the
coefcient vectors by the following recursion:
x
k+1
= x
k

k
k
(S(x
k
))
1
f
t
(x
k
),
where S(x
k
) denotes Grams matrix for the scalar products (, )
g
k
w.r.t. the basis
b
1
, . . . , b
n
,
k
=
k
= 1, if h = f
, and
k
= |w Ax
k
|
()
and
k
=
_
T
z(x
k
)
t
(z(x
k
))d 1 with z(x
k
) :=
wAx
k
k
, if h = | |
()
.
In particular
k
> 0 for f
t
(x
k
) ,= 0.
Proof. As
t
(s)(0 s) (s) on R we have in the case of the Luxemburg norm
k
=
_
T
z(x
k
)
t
(z(x
k
))d
_
T
(z(x
k
))d = 1.
According to the denition of u
k
we have: S(x
k
)y
k
= r
k
, where
r
kj
:= (w, b
j
)
g
k
= (w Ax
k
, b
j
)
g
k
+ (Ax
k
, b
j
)
g
k
=
k
_
T
b
j
t
(z(x
k
))d +
n
i=1
x
ki
(b
i
, b
j
)
g
k
.
We obtain
S(x
k
)y
k
=
k
k
f
t
(x
k
) +S(x
k
)x
k
,
i.e.
x
k
y
k
=
k
k
(S(x
k
))
1
f
t
(x
k
),
and due to
x
k+1
= x
k

k
(x
k
y
k
)
the rst part of the assertion follows.
Together with S(x
k
) also (S(x
k
))
1
is positive denite, and hence
lim
0
f(x
k
(S(x
k
))
1
f
t
(x
k
)) f(x
k
)
= f
t
(x
k
), (S(x
k
))
1
f
t
(x
k
) < 0,
which means (S(x
k
))
1
f
t
(x
k
) is a direction of descent, i.e.
0 > f(x
k+1
) f(x
k
) f
t
(x
k
), x
k+1
x
k
=
k
f
t
(x
k
), x
k
y
k
=
k
k
f
t
(x
k
), (S(x
k
))
1
f
t
(x
k
),
and hence
k
> 0, if f
t
(x
k
) ,= 0.
-approximation 33
In terms of the coefcient vectors the Karlovitz method can be interpreted as a
modied gradient method.
Remark 1.5.5. If
t
is Lipschitz continuous, one can instead of the exact determina-
tion of the step parameter
k
use a nite search method like the Armijo Rule (see (4.3)
in Chapter 4 and [86]).
Remark 1.5.6. The functions
p
= [s[
p
are exactly those Young functions, for which
tt
and F differ only by a constant factor. For p 2 also
tt
p
(0) is nite. For modulars
f
p
the method of Karlovitz corresponds to a damped Newton method.
Chapter 2
Polya Algorithms in Orlicz Spaces
2.1 The Classical Polya Algorithm
Let T be a compact subset of R
r
and let x C(T). The following theorem was
shown by Polya:
Theorem 2.1.1 (Polya Algorithm). Let V be a nite-dimensional subspace of C(T).
Then every point of accumulation of the sequence of best L
p
-approximations of x w.r.t.
V for p is a best Chebyshev approximation of x w.r.t. V .
In the year 1920 Polya raised the question, whether the above sequence of best ap-
proximations actually converges. A negative answer was given in 1963 by Descloux
for the continuous case, while he was able to show convergence to the strict approxi-
mation for the discrete case (see [25], compare Theorem 2.3.1).
Instead of considering the case p , one can study the behavior of the sequence
of best L
p
-approximations for p 1. We can show that (see Equation (5.7) in Chap-
ter 5) for an arbitrary measure space (T, , ) the sequence of best approximations
converges to the best L
1
()-approximation of largest entropy.
2.2 Generalized Polya Algorithm
We will now treat the subject considered by Polya in the framework of Orlicz spaces.
In Chapter 1 we have become acquainted with a numerical method which can be
used to perform Polya algorithms for the computation of a best approximation in the
mean or a best Chebyshev approximation. We will discuss more effective methods in
Chapter 4.
An important aim in this context is, to replace the determination of a best approx-
imation for limit problems that are difcult to treat (non-differentiable, non-unique)
in an approximate sense by approximation problems that are numerically easier to
solve (Polya algorithm). The required differentiability properties of the corresponding
function sequence to be minimized can already be provided by properties of their one-
dimensional Young functions. The broad variety available in the choice of the Young
functions is also used to inuence the numerical properties in favorable way. The sub-
sequent discussion will show that the convergence of the sequence of approximative
problems to a specic best approximation can even in the case of non-uniqueness of
the limit problem frequently be achieved through a proper choice of the sequence of
Section 2.3 Polya Algorithm for the Discrete Chebyshev Approximation 35
Young functions or by an outer regularization by use of convergence estimates devel-
oped below. The characterization of the limits is treated in the framework of two-stage
optimization. The required pointwise convergence of the functions to be optimized is
guaranteed through the pointwise convergence of the corresponding Young functions.
The best approximation in the mean in this sense was already discussed in the intro-
duction. A related problem is the Chebyshev approximation, to which a considerable
part of these principles carries over. In the latter context however the modulars have
to be replaced by the corresponding Minkowski functionals.
The following stability principle will play a central role in our subsequent discus-
sion:
Theorem 2.2.1. Let be a norm on the nite-dimensional vector space X and let
(
k
)
kN
be a sequence of semi-norms on X that converges pointwise on X to . Let
further V be a subspace of X, x X V and v
k
a
k
-best approximation of x w.r.t.
V . The following statements hold:
(a) every subsequence of (v
k
) has a -convergent subsequence
(b) lim(x v
k
) = d
(x, V )
(c) every -point of accumulation of (v
k
) is a -best approximation of x w.r.t. V
(d) if x has a unique -best approximation v
w.r.t. V , then
lim
k
(v
v
k
) = 0.
This theorem of Kripke, proved in [73], is a special case of the stability theorem of
convex optimization (see Theorem 5.3.21 and Theorem 5.3.25).
2.3 Polya Algorithm for the Discrete Chebyshev
Approximation
As was seen above, the maximum norm can be described as a Minkowski functional
using the Young function
restated below:
: R R dened by
(t) :=
_
0 for [t[ 1
otherwise.
Then we obtain
|x|
= inf
_
c > 0
tT
_
x(t)
c
_
1
_
.
It is not too far-fetched, to proceed in a similar way as in the introduction of Chap-
ter 1 and to consider sequences (
k
)
kN
of Young functions with favorable numerical
36 Chapter 2 Polya Algorithms in Orlicz Spaces
properties (differentiability, strict convexity) that converge pointwise to
. For the
corresponding Luxemburg norms we obtain
|x|
(
k
)
:= inf
_
c > 0
tT
k
_
x(t)
c
_
1
_
,
and it turns out that pointwise convergence of the Young functions carries over to
pointwise convergence of the Luxemburg norms. We obtain (see below)
lim
k
|x|
(
k
)
= |x|
.
Thus the preconditions for the application of the stability theorem (see Theorem 2.2.1)
are satised, i.e. the sequence (v
k
)
kN
of best approximations of x w.r.t. V is bounded
and every point of accumulation of this sequence is a best Chebyshev approximation.
Theorem 2.3.1. Let (
k
)
kN
be a sequence of Young functions that converges point-
wise to the Young function
(t) :=
_
0 for [t[ 1
otherwise.
Then this carries over to the pointwise convergence of the Luxemburg norms, i.e.
lim
k
|x|
(
k
)
= |x|
.
Proof. Let x ,= 0 and let > 0 be given (
1
2
|x|
). We have:
tT
k
_
x(t)
|x|
+
_
tT
k
_
|x|
|x|
+
_
= [T[
k
_
|x|
|x|
+
_
k
0
because of the pointwise convergence of the sequence of Young functions (
k
)
kN
to
. Thus
|x|
(
k
)
|x|
+ for k sufciently large.

On the other hand there is t
0
T with [x(t
0
)[ = |x|
. We obtain
tT
k
_
x(t)
|x|

_

k
_
x(t
0
)
|x|

_
=
k
_
|x|
|x|

_
k
.
Hence
|x|
|x|
(
k
)
for k sufciently large.
As a special case we obtain the discrete version of the classical Polya algorithm,
which has the property that every point of accumulation of the best l
p
-approximations
(p > 1) for p to innity is a best Chebyshev approximation. For this purpose we
dene
p
(t) = [t[
p
. We have

tT
[
x(t)
c
[
p
= 1 if and only if (
tT
[x(t)[
p
)
1
p
= c.
Hence the Luxemburg norm yields the well-known l
p
-norm
|x|
(
p
)
= |x|
p
=
_
tT
[x(t)[
p
_1
p
.
For continuous Young functions , we can use the equation
tT
_
x(t)
|x|
()
_
= 1 (2.1)
for the computation of the unknown Luxemburg norm |x|
()
.
2.3.1 The Strict Approximation as the Limit of Polya Algorithms
Polyas question, whether even genuine convergence can be shown, was given a pos-
itive answer by Descloux in the year 1963 (for the classical Polya algorithm), by
proving convergence to the strict approximation.
Theorem 2.3.2. Let T = t
1
, t
2
, . . . , t
m
and (
k
)
kN
be a sequence of continuous
Young functions with the following properties:
(1) lim
k
(s) =
_
0 for [s[ < 1,
for [s[ > 1,
(2)
k
(r s)
k
(r)
k
(s) for s, r 0.
Let further L
k
() be the Orlicz spaces for the measure with (t
i
) = 1, i =
1, 2, . . . , m, and V a subspace of C(T), x C(T), and let v
k
be best L
k
() approx-
imations of x in the corresponding Luxemburg norms and v
the strict approximation

(see Denition 1.4.1) of x w.r.t. V . Then
lim
k
v
k
= v
.
Proof. We can assume x C(T) V , because otherwise x = v
k
= v
.
If is continuous and |x|
()
> 0, then, as already mentioned above
_
T
_
x
|x|
()
_
d = 1. (2.2)
Due to the Stability Theorem 2.2.1 every point of accumulation of the sequence
(v
k
) is a best Chebyshev approximation. Let (v
k
) be a convergent subsequence and

let v
0
be the limit of (v
k
).
Since all norms on a nite dimensional linear space are equivalent, v
0
is also the
limit of (v
k
) in the maximum norm.

Now we will show that
v
0
= v
.
The assertion is proved via complete induction over j, where we assume
v
0
(t) = v
(t) for t T
j
, (2.3)
(see denition of the Strict Approximation 1.4.1). The start of the induction for j = 0
follows directly from the fact that v
0
is a best L
-approximation of x.
Suppose (2.3) holds for j. If (2.3) holds for D
j+1
, then also for T
j+1
(see denition
of the Strict Approximation and Lemma 1.4.4).
If v
0
(
t) ,= v
t) for a

t D
j+1
, then apparently v
0
/ V
j+1
. Thus there is a
t T T
j
with
[x(
t) v
0
(
t)[ > |x v
|
T\T
j
.
Let
z
k
= x v
k
, z
0
= x v
0
and z
= x v
,
then we obtain
[z
0
(
t)[ = s
0
> d
0
= |z
|
T\T
j
.
Let :=
s
0
d
0
3
, then there is a k
0
such that for all k
> k
0
|v
k
v
0
|
< , (2.4)
and by (1) a k
1
k
0
, such that for all k
> k
1
_
s
0

d
0
+
_
> m. (2.5)
From (2) it follows that
_
s
0

d
0
+
_
=
k
_
_
s
0
|z
k
|
(
k
)
d
0
+
|z
k
|
(
k
)
_
_

k
_
_
_
z
k
t)
|z
k
|
(
k
)
d
0
+
|z
k
|
(
k
)
_
_
_
(
z
k
t)
|z
k
|
(
k
)
)
(
d
0
+
|z
k
|
(
k
)
)
.
This leads to
_
z
k
t)
|z
k
|
(
k
)
_
> m
k
_
d
0
+
|z
k
|
(
k
)
_
tT\T
j
_
z
(t) +
|z
k
|
(
k
)
_
tT\T
j
_
z
(t) +z
k
(t) z
0
(t)
|z
k
|
(
k
)
_
.
By the induction hypothesis z
(t) = z
0
(t) for all t T
j
and hence
1 =
tT
_
z
k
(t)
|z
k
|
(
k
)
_
>
tT
_
z
(t) +z
k
(t) z
0
(t)
|z
k
|
(
k
)
_
.
Due to (2.2) we have
|v
k
v
0
+v x|
(
k
)
< |v
k
x|
(
k
)
,
a contradiction to the assumption that v
k
is a best L
-approximation of x. Thus we
have shown that every convergent subsequence (v
k
) of the bounded sequence v

k
converges to v
so that the total sequence (v

k
) converges to v
.
2.3.2 About the Choice of the Young Functions
Under numerical aspects, the choice of the sequence of Young functions is of central
importance. This is true for the complexity of function evaluation as well as for the
numerical stability.
The subsequent remark shows that for pointwise convergence of the Luxemburg
norms the pointwise convergence of the Young functions is sufcient in a relaxed
sense. The exibility in the choice of Young functions is enlarged thereby, and plays
a considerable role for the actual computation.
Remark. For the pointwise convergence of the Luxemburg norms it is sufcient to
require
(a)
k
(t)
k
for t > 1
(b)
k
(1)
1
[T[
.
Proof. We have
tT
k
_
1
k
_
1
[T[
_
x(t)
|x|
tT
k
_
1
k
_
1
[T[
_
1
_
= 1,
and hence
|x|
(
k
)

|x|
1
k
(
1
[T[
)
,
but from (b) we obtain
1
k
_
1
[T[
_
1,
whence
|x|
(
k
)
|x|
.
The other direction of the inequality is obtained as in the last part of the proof of
Theorem 2.3.1.
So we can leave the values
k
(t) for t [0, 1] unchanged when constructing the
Young functions k N.
Example 2.3.3.
k
(t) :=
_
1
[T[
t
2
for [t[ 1
k
3
[t[
3
+ (
1
[T[
3k
3
)t
2
+ 3k
3
[t[ k
3
for [t[ > 1.
This example will have prototype character throughout this chapter for the proper-
ties of Young functions and corresponding Luxemburg norms.
2.3.3 Numerical Execution of the Polya Algorithm
In the previous chapter we have treated the properties of Young functions which guar-
antee uniqueness, twice differentiability of the norm, and the regularity of the Hessian.
If we choose a sequence of Young functions that converges pointwise in the relaxed
sense of the previous section, then the stability theorems guarantee that the sequence
of the approximative solutions is bounded and every point of accumulation of this
sequence is a best Chebyshev approximation. Thus we can sketch the numerical exe-
cution of the Polya algorithm in the following way:
Algorithmic Scheme of the Polya Algorithm
(a) Choose a sequence (
k
)
kN
in the above sense.
(b) Choose a starting point a
0
for the coefcient vector and set k = 1.
(c) Solve the non-linear system
tT
v
i
(t)
t
k
_
x
n
l=1
a
l
v
l
|x
n
l=1
a
l
v
l
|
(
k
)
_
= 0 for i = 1, . . . , n,
using an iterative method from Chapter 1 (or 4), where the computation of the
norm as a one-dimensional solution for the quantity c of the equation
tT
k
_
x
n
l=1
a
l
v
l
c
_
1 = 0
has to be performed in each iteration step.
(d) Take the solution vector computed in this way as a new starting point, set
k k + 1 and go to (c).
Remarks. (a) The sequence of Young functions occurring in the above scheme can
as a rule be understood as a subsequence of a parametric family of functions. The
Section 2.4 Stability of Polya Algorithms in Orlicz Spaces 41
change of the parameter then should be done in a way that the current minimal solution
is a good starting point for the solution method w.r.t. the subsequent parameter. This is
of particular importance, since we want to employ rapidly convergent (Q-superlinear)
methods (see Section 4.1).
In the above parametric examples of sequences of Young functions (see e.g. Ex-
ample 2.3.3) experience has shown that as a parameter the sequence (c
j
)
jN
with
2 c 8 can be recommended.
(b) Step (c) of the above algorithmic scheme can also be replaced by the solution of
the non-linear system (1.13). As numerical methods those for the solution of non-
linear equations have to be employed (see Chapter 4).
(c) The question referring to genuine convergence of the approximate solutions to a
best Chebyshev approximation and the description of the limit will be treated in detail
below.
(d) The question referring to an appropriate stop criterion for the iteration we will
discuss later. As a rst choice one can use the change between the k-th and k + 1-st
solution or use the characterisation theorems for a best Chebyshev approximation (see
Theorem 1.3.3).
(e) An estimate for the quality of the best approximations w.r.t. | |
(
k
)
as approxi-
mations for a best Chebyshev approximation is obtained by the convergence estimates
in Section 2.5.2.
We will now state the Polya algorithm in the framework of general measures.
2.4 Stability of Polya Algorithms in Orlicz Spaces
Let (T, , ) be a nite measure space. We will now proceed in a similar way as in the
previous section and consider sequences (
k
)
kN
of Young functions with amiable
numerical properties (differentiability, strict convexity), which converges pointwise
to
. If we consider the corresponding Luxemburg norms |x|

(
k
)
, it can be shown
that also in this situation pointwise convergence of the Young functions carries over
to pointwise convergence of the Luxemburg norms, i.e.
lim
k
|x|
(
k
)
= |x|
.
Thus the requirements for the corresponding stability theorems (see Theorem 2.2.1
or Theorem 5.3.25) are satised, i.e. the sequence (v
k
)
kN
of best approximations is
bounded and every point of accumulation of this sequence is a best L
-approxima-
tion, also denoted as Chebyshev approximation. A corresponding statement can be
made for best L
1
-approximations.
For the following stability considerations we need the fact that (L
, | |
) is a
Banach space (see Theorem 6.2.4).
In order to describe the behavior of best approximations related to sequences of
modulars (f
k
) or of sequences of Luxemburg norms (| |
(
k
)
) in a unied way,
the following stability principle that constitutes a broadening of the scope of Theo-
rem 2.2.1 is of central signicance (see Theorem 5.3.18):
Theorem 2.4.1. Let X be a Banach space, K a closed convex subset of X and (f
n
:
X R)
nN
a sequence of convex continuous functions that converges pointwise to
a function f : X R. Let x
n
be a minimal solution of f
n
on K. Then every point of
accumulation of the sequence (x
n
)
nN
is a minimal solution of f on K.
In the nite-dimensional case one has the following situation: if the set of solutions
of the limit problem M(f, K) is bounded, then the set of points of accumulation
is non-empty and the sequence of the minimal values (inf(f
n
, K)) converges to the
minimal value inf(f, K) of the limit problem (see Theorem 5.3.25).
In order to obtain stability of our generalized Polya algorithms, it sufces to es-
tablish pointwise convergence of modulars and Luxemburg norms (for an appropriate
choice Young functions):
k
| |
1
| |
(
k
)
| |
1
| |
(
k
)
| |
on L
or C(T). The theorem of BanachSteinhaus 5.3.17 for convex functions will

permit to reduce the proof of pointwise convergence of the functionals to the pointwise
convergence of the (1-dimensional) Young functions (
k
), using the fact that the step
functions form a dense subset. As a preparation we need the following
Lemma 2.4.2. Let be a nite Young function, then corresponding modular f
and
the corresponding Luxemburg norm | |
()
are continuous on (L
, | |
).
Proof. Let |x
k
x|

k
0, then let > 0 and > 0 be chosen such that
<
1
_
1
(T)
_
,
then for k sufciently large
_
T
_
x x
k
_
d
_
T
_
d = (T)
_
_
< 1,
hence |x
k
x|
()
.
In particular there is a number M such that the modulus of the values of x and of the
sequence (x
k
) is contained a.e. in the interval [0, M]. Since is uniformly continuous
on [0, M], there is for given > 0 a > 0 with |x
k
x|
< for k sufciently

large and [(x) (x
k
)[ < a. e. . It follows that [f
(x) f
(x
k
)[ < (T).
Section 2.4 Stability of Polya Algorithms in Orlicz Spaces 43
Theorem 2.4.3. Let (T, , ) be a nite measure space and (
k
)
kN
a sequence of
nite Young functions, which converges pointwise to the Young function
.
Then this carries over to the pointwise convergence of the Luxemburg norms, i.e.
for x L
() we have
lim
k
|x|
(
k
)
= |x|
.
Proof. L
() is contained in L
k
() for all k N, since T has nite measure
and
k
is a nite Young function. The assertion is veried using the theorem of
BanachSteinhaus for convex functions (see Theorem 5.3.17 in Chapter 5). Accord-
ing to the previous lemma | |
(
k
)
is continuous on L
() for all k N. Further-

more the step functions are dense in the Banach space (L
(), | |
) (see Theo-
rem 6.2.19). We show at rst the convergence of the norms for a given step function
x :=
m
n=1
T
n
, where T
i
T
j
= for i ,= j. Let 0 < < |x|
be arbitrary,
then we obtain
_
T
k
_
x
|x|
+
_
d =
m
n=1
k
_

n
|x|
+
_
(T
n
)
k
0,
i.e. |x|
(
k
)
|x|
+ for k sufciently large. On the other hand there is a n

0
with
|x|
= [
n
0
[ and hence
m
n=1
k
_

n
|x|

_
(T
n
)
k
,
i.e. |x|
(
k
)
|x|

It is just as simple to verify the pointwise boundedness of the sequence of the
norms: let x L
(), i.e. there is a M > 0 such that [x(t)[
M a. e. .
Thus we obtain
_
T
k
_
x(t)
M + 1
_
d
_
T
k
_
M
M + 1
_
d
=
k
_
M
M + 1
_
(T)
k
0,
i.e. there is a k
0
N, such that for all k > k
0
|x|
(
k
)
M + 1,
thus the pointwise boundedness is established.
This theorem is the basis for the Polya algorithm in C(T), if C(T) for an appropri-
ately chosen measure space (T, , ) is considered as a closed subspace of L
().
A corresponding statement can be made for approximations of the L
1
-norm:
Theorem 2.4.4. Let (T, , ) be a nite measure space and (
k
)
kN
a sequence
of nite Young functions, which converges pointwise to the Young function
1
with
1
(s) = [s[. Then this carries over to the pointwise convergence of the modulars and
the Luxemburg norms, i.e. for x L
() we have
lim
k
|x|
(
k
)
= |x|
1
lim
k
f
k
(x) = |x|
1
.
Proof. L
() is contained in L
k
() for all k N, since T has nite measure and
k
is a nite Young function. The proof is again performed by use of the theorem of
BanachSteinhaus for convex functions. By the above lemma, f
k
and | |
(
k
)
are
continuous on L
() for all k N. As stated above, the step functions are dense

in L
(). First we will show the convergence of the norms for a given step function
x :=
m
n=1
T
n
. Let 0 < < |x|
1
be arbitrary, then we have
_
T
k
_
x
|x|
1
+
_
d =
m
n=1
k
_

n
|x|
1
+
_
(T
n
)
k
1
|x|
1
+
m
n=1
[
n
[(T
n
)
=
|x|
1
|x|
1
+
< 1,
i.e. |x|
(
k
)
|x|
1
+ for k sufciently large. On the other hand we obtain for
< |x|
1
m
n=1
k
_

n
|x|
1

_
(T
n
)
k
1
|x|
1

m
n=1
[
n
[(T
n
) =
|x|
1
|x|
1

> 1,
i.e. |x|
(
k
)
|x|
1
For modulars we immediately obtain
f
k
(x) =
m
n=1
k
(
n
)(T
n
)
k
n=1
[
n
[(T
n
) = |x|
1
.
It is just as simple to verify the pointwise boundedness of the sequence of norms:
let x L
(), i.e. there is a M > 0 such that [x(t)[ M -a. e. . Hence we obtain
_
T
k
_
x(t)
(M + 1)(T)
_
d
_
T
k
_
M
(M + 1)(T)
_
d
=
k
_
M
(M + 1)(T)
_
(T)
k
M
M + 1
< 1,
Section 2.5 Convergence Estimates and Robustness 45
i.e. there is a k
0
N, such that for all k > k
0
|x|
(
k
)
(M + 1)(T).
For the modulars we have
[f
k
(x)[ [f
k
(M)[,
and the pointwise boundedness of the modulars follows from the pointwise conver-
gence of the Young functions
2.5 Convergence Estimates and Robustness
In a numerical realization of the Polya Algorithm the solutions of the approximating
problems are not solved exactly. As a criterion to terminate the iteration one can use
the norm of the gradient.
It turns out that stability is essentially retained if the approximating problems are
solved only approximately by some x
k
provided the sequence (|f
k
( x
k
)|) goes to
zero (robustness).
Theorem 2.5.1 (see Theorem 5.5.1). Let (f
k
: R
r
R)
kN
be a sequence of dif-
ferentiable convex functions that converges pointwise to a function f : R
n
R. Let
further the set of minimal solutions of f on R
n
be non-empty and bounded and let
( x
k
)
kN
be a sequence in R
n
with the property lim
k
|f
k
( x
k
)| = 0 then we
have
(a) The set of limit points of the sequence ( x
k
) is non-empty and contained in
M(f, R
n
).
(b) f
k
( x
k
) inf f(R
n
).
(c) f( x
k
) inf f(R
n
).
(d) Let Q be an open bounded superset of M(f, R
n
) and
k
:= sup
xQ
[f(x)
f
k
(x)[, then
[inf f(R
n
) f( x
k
)[ = O(
k
) +O(|f
k
( x
k
)|).
The above theorem makes use of the following theorem on convergence estimates
(which in turn makes use of the stability theorem):
Theorem 2.5.2 (see Theorem 5.5.2). Let (f
k
: R
n
R)
kN
be a sequence of convex
functions that converges pointwise to a function f : R
n
R with bounded level sets.
Let further K be a closed convex subset of R
n
and x
k
M(f
k
, K) and let Q be
an open bounded superset of M(f, K) where
k
:= sup
xQ
[f(x) f
k
(x)[. Then
k
0 and we obtain
(a) f(x
k
) inf f(K) = O(
k
)
(b) [f
k
(x
k
) inf f(K)[ = O(
k
).
It is now our aim to give estimates of the above kind for the subspace of Hlder
Lipschitz continuous functions on a compact subset of R
r
satisfying a cone condition,
and where is the Lebesgue measure of R
r
(see Section 2.5.2).
Denition 2.5.3. A subset T of R
r
satises a cone condition if there is a cone C with
interior point and a number > 0 such that for all t T there exists an orthogonal
transformation A
t
with the property
t + (B A
t
C) T,
where B denotes the unit ball in R
r
w.r.t. the Euclidean norm.
Denition 2.5.4. Let T be a compact subset of R
r
and let 0 < 1. We dene
Lip
(T) as the space of all real-valued functions x on T with the property: there is
L R with [x(t) x(t
t
)[ L|t t
t
|
for all t, t
t
T, where | | is the Euclidean
norm of R
r
.
Theorem 2.5.5. Let T be a compact subset of R
r
that satises a cone condition,
the Lebesgue measure, and (
k
)
kN
a sequence of nite Youngs functions with
lim
k
k
(s) =
(s) for [s[ ,= 1. Let K be a nite-dimensional subset of

Lip
(T), then there is a sequence (

k
)
kN
converging to zero with
k
:= max(
k
,
k
),
where
k
is the solution of the equation
s
r
k
(1 +s
) = 1,
and
k
:=
1
1
k
(
1
(T)
)
1. There is also a constant > 0 such that for all x K
[|x|
|x|
(
k
)
[ |x|
k
.
Remark. If
k
(1)
1
(T)
then |x|
(
k
)
|x|
.
Examples. (a)
k
(s) := as
2
+ k(k([s[ 1)
+
)
m
with a > 0 and n N. For every
c (0, 1) there is k
0
N such that for k k
0
c
k
q
<
k
<
1
k
q
with q :=
m+ 1
m+
r
.
(b)
k
(s) := [s[
k
. For every c (0, 1) there is k
0
N such that for k k
0
c
r
log k
k

k

r
log(k 1)
k 1
.
In the discrete case the situation is somewhat simpler than in the continuous case.
Theorem 2.5.6. Let T := t
1
, . . . , t
N
and : T (0, ) a weight function. Let
(
k
)
kN
be a sequence of nite Youngs functions with lim
k
k
(s) =
(s) for
[s[ ,= 1 and x : T R.
Then we obtain the estimate [|x|
(
k
)
|x|
[
k
|x|
, where with :=
min
tT
(t) we dene
k
:= max
1
1
k
(
1
(T)
)
1, 1
1
1
k
(
1
)
.
For approximations of the L
1
-norm we obtain (see Theorem 2.5.2):
Theorem 2.5.7. Let (T, , ) be a nite measure space, (
k
)
kN
a sequence of nite,
denite Young functions with lim
k
k
(s) = [s[ on R.
Let further K be a nite-dimensional subset of L
() and let be chosen such that

| |
| |
1
on the span of K, then there is a sequence (
k
)
kN
tending to zero
with
k
:= max((g
k
() 1),
k
), where
k
is the larger solution of
1
k
(s) =
s
1s
and g
k
(s) :=
s
1
k
(s)
for s > 0, then
[|x|
1
|x|
k
[ |x|
1
k
for all x K.
2.5.1 Two-Stage Optimization
Let us return to Polyas original question concerning the convergence of the sequence
of minimal solutions. It turns out that a careful choice of the sequence of approxi-
mating functions (f
k
) (or for Polya algorithms in Orlicz space of the corresponding
Young functions) often implies convergence to a second stage solution in the follow-
ing sense: limit points x of the sequence of minimal solutions can under favorable
conditions be characterized as x M(g, M(f, K)) where g is an implicitly de-
termined or explicitly chosen function. If g is strictly convex convergence of the
minimizing sequence is established.
Strategies for this type of regularization are differentiation with respect to the pa-
rameter (if the approximating functions are viewed as a parametric family rather
than as a sequence) for determination of an implicitly given g or use of the con-
vergence estimates derived above for an explicit regularization. Below we present a
nite-dimensional robust version of a theorem pertinent to this question (see Theo-
rem 5.6.12).
Theorem 2.5.8. Let (f
k
: R
n
R)
kN
be a sequence of convex functions that
converges pointwise to a function f : R
n
R. Let further the set of minimal solutions
of f on R
n
be non-empty and bounded.
Let (
k
)
kN
be a sequence with
k
> 0 and
k
0 such that the sequence of
functions (
f
k
f
k
)
kN
converges lower semi-continuously (see Denition 5.1.1) on an
open bounded superset Q of M(f, R
n
) to a function g : Q R.
Let further (
k
) be o(
k
) and ( x
k
) be a sequence in R
n
with the property
|f
k
( x
k
)|
k
.
Then the set of limit points of the sequence ( x
k
M(g, M(f, R
n
)).
Example 2.5.9. We consider once more the best L
p
-approximations (p > 1) for
p 1. Let (
k
) be as above. If we dene f
k
(x) :=
_
T
[x(t)[
1+
k
d and f(x) :=
_
T
[x(t)[d then the sequence (
f
k
f
k
) converges lower semi-continuously to the
strictly convex function g with
g(x) =
_
T
[x(t)[ log [x(t)[d.
The reason rests with the fact that the mapping
_
T
[x(t)[
1+
d is convex and
the sequence (
f
k
f
k
) represents the difference quotients (for xed x) that converge to
the derivative g (w.r.t. ) at = 0.
If now the best L
p
k
-solutions with p
k
= 1 +
k
are determined only approximately
but with growing precision in the sense that the norms of the gradients converge to 0
faster than the sequence (
k
) the sequence of approximate solutions converges the
best L
1
()-approximation of largest entropy (see Equation (5.7)) (note that g is the
entropy function).
A further consequence of the above theorem is the following
k
: R
n
R)
kN
be a sequence of convex functions that
converges pointwise to a function f : R
n
R. Let further M(f, R
n
) be non-empty
and bounded.
Let Q be an open and bounded superset of M(f, R
n
) and
k
:= sup
xQ
[f(x)
f
k
(x)[.
Moreover, let (
k
)
kN
be chosen such that
k
= o(
k
), let the function g : R
n
R
be convex, and the sequence (h
k
: R
n
R)
kN
be dened by
h
k
:=
k
g +f
k
.
Let further (
k
)
kN
be chosen such that
k
= o(
k
) and let ( x
k
)
kN
a sequence in
R
n
with the property
|h
k
( x
k
)|
k
.
Then the set of limit points of the sequence ( x
k
M(g, M(f, R
n
)).
If g is strictly convex then the sequence ( x
k
) converges to the uniquely determined
second stage solution.
By use of the convergence estimates given above we are now in the position to
present a robust regularization of the Polya algorithm for the Chebyshev approxima-
tion in Orlicz space using an arbitrarily chosen strictly convex function g (see Theo-
rem 5.6.13).
Theorem 2.5.11. Let T be a compact metric space and V a nite-dimensional sub-
space of C(T) and x C(T). Let further (
k
)
kN
be a sequence converging to 0
such that for all y x +V
[|y|
(
k
)
|y|
[
k
|y|
.
We now choose a sequence of positive numbers (
k
)
kN
converging to zero such
that lim
k
k
= 0 and a strictly convex function g : x + V R. An outer
regularization of the Luxemburg norms we obtain by dening the sequence (h
k
:
x +V R)
kN
as
h
k
:=
k
g +| |
(
k
)
.
If the minimal solutions of the functions h
k
are determined approximately such that
the norms of the gradients go to zero more rapidly than the sequence (
k
) then the
sequence of the approximate solutions converges to the best Chebyshev approximation
which is minimal w.r.t. g.
In the following section we will give a detailed proof for the estimates stated above.
2.5.2 Convergence Estimates
Discrete Maximum Norm
Lemma 2.5.12. Let (
k
)
kN
be a sequence of nite Young functions, which converges
pointwise to the Young function
(t) :=
_
0 for [t[ 1
otherwise
for [t[ ,= 1. Then for arbitrary a > 0 we obtain
lim
k
1
k
(a) = 1.
Proof. Let 0 < < 1 be given, then there is a K
0
, such that for k K
0
we have:
k
(1+) > a, i.e. 1+ >
1
k
(a). Conversely there is a K
1
such that
k
(1) < a
for k K
1
and hence 1 <
1
k
(a).
Theorem 2.5.13. Let T := t
1
, . . . , t
m
and : T (0, ) be a weight function.
Let be a nite Young function and x : T R, then we obtain for the weighted
norm
|x|
(),T
:= inf
_
c
tT
(t)
_
x(t)
c
_
1
_
the following estimate:
[|x|
(),T
|x|
,T
[
|x|
,T
with
= max
_
1
1
(
1
(T)
)
1, 1
1
1
(
1
)
_
,
where (T) :=
tT
(t) and := min
tT
(t).
Proof. For a t
0
we have [x(t
0
)[ = |x|
,T
. Let
x(t) :=
_
[x(t
0
)[ for t = t
0
0 otherwise.
Then we obtain because of the monotonicity of the Luxemburg norm
|x|
(),T
| x|
(),T
inf
_
c
_
|x|
,T
c
_
1
_
= |x|
,T

1
1
(
1
)
.
On the other hand, if we assume x to be equal to the constant |x|
,T
on T, then we
obtain
|x|
(),T
| x|
(),T
= inf
_
c
(T)
_
|x|
,T
c
_
1
_
= |x|
,T

1
1
(
1
(T)
)
.
Corollary 2.5.14. Let (
k
)
kN
be a sequence of nite Young functions, which con-
verges pointwise to the die Young function
. Let
k
= max
_
1
1
k
(
1
(T)
)
1, 1
1
1
k
(
1
)
_
,
then (
k
)
kN
is a sequence tending to zero and we obtain the following estimate:
[|x|
(
k
),T
|x|
,T
[
k
|x|
,T
.
Maximum Norm in C
(T)
It is our aim in this section to present the above mentioned estimates for the subspace
of HlderLipschitz continuous functions on a compact subset of R
r
that satises a
cone condition, where is the Lebesgue measure of R
r
.
Denition 2.5.15. A subset T of R
r
satises a cone condition, if there is a cone C
with interior point and a number > 0, such that for all t T there is an orthogonal
transformation A
t
with the property that
t + (B A
t
C) T,
if B denotes the unit ball w.r.t. the Euclidean norm.
Remark 2.5.16. Every nite union of sets which satisfy a cone condition also satises
this condition. Every compact convex subset T of the R
r
with interior point satises
a cone condition.
Lemma 2.5.17. Let be the Lebesgue measure on R
r
and T a compact subset of the
R
r
, which satises a cone condition, let further
q
(t) := inf > 0 [ ((t +B) T) for 0 < (T) and t T,

then there are positive numbers
0
and c such that
q
(t) c
1
r
for 0 <
0
and t T,
and we have
= ((t +q
(t)B) T).
Proof. Let
r
(B C) =:
0
, and let Q
,t
:= > 0 [ ((t +B) T) ,
then we obtain because of the cone condition for T and the invariance of w.r.t.
translations and orthogonal transformations and the homogeneity of
((t +B) T) (t + (A
t
C B)) =
r
(B C) =
0
,
hence Q
0
,t
and thus q
0
(t) . For
0
we have Q
,t
Q
0
,t
and
thus q
(t) q
0
(t). Due to the continuity of w.r.t. in the denition of q
(t) the
inmum is assumed there and we obtain as above
= ((t +q
(t)B) T) (q
(t))
r
(B C).
With the denition c := ((B C))
1
r
the assertion follows.
Denition 2.5.18. Let T be a compact subset of the R
r
and let 0 < 1. We dene
C
(T) as the space of all real valued functions x on T with the property: there is a
number L R with
[x(t) x(t
t
)[ L|t t
t
|
for all t, t
t
T, where | | denotes the Euclidean norm of R
r
. Let further be
L
(x) := infL > 0 [ [x(t) x(t

t
)[ L|t t
t
|
for all t, t
t
T, t ,= t
t
.
Remark 2.5.19. It is easily seen: L
is a semi-norm on C
(T), since apparently

[(x(t) +y(t)) (x(t
t
) +y(t
t
))[ (L
(x) +L
(y))|t t
t
|
for all t, t
t
T, t ,=t
t
,
hence L
(x +y) L
(x) +L
(y). The positive homogeneity is obvious.

Lemma 2.5.20. Let be a nite Young function, then the equation
s
r
(1 +s
) = 1
has a unique positive solution
.
Proof. Let s
0
:= sups 0 [ (s) = 0, then 0 s
0
< , because , being a
Young function, cannot be identical to zero. Let now s
0
< s
1
< s
2
, then there is a
(0, 1) with s
1
= (1 )s
0
+s
2
and thus
0 < (s
1
) (1 )(s
0
) +(s
2
) = (s
2
) < (s
2
),
i.e. is strictly increasing on [s
0
, ). Let now
t
+
be the right-sided derivative then
the subgradient inequality yields (s) s
t
+
(s), in particular
t
+
(s) > 0 for
s > s
0
. Let now s
1
> s
0
, then we obtain due to the subgradient inequality
(s
1
) +
t
+
(s
1
)(s s
1
) (s),
and hence lim
s
(s) = . Let now u(s) := s
r
(1 + s
), then apparently u
is strictly increasing on [s
0
, ) and according to Theorem 5.3.11 continuous there.
Furthermore we have: u(0) = 0 and lim
s
u(s) = , and the intermediate value
theorem yields the assertion.
Lemma 2.5.21. Let T be a compact subset of the R
r
that satises a cone condition,
let be the Lebesgue measure, and a nite Young function with

1/r
0
(see
Lemma 2.5.17), if
is the solution of the equation s

r
(1 +s
) = 1.
If x C
(T), then with c as in Lemma 2.5.17 we obtain

|x|
|x|
()
+
(|x|
()
+c
(x)).
On the other hand with
:=
1
1
(
1
(T)
)
1 the inequality
|x|
()
|x|
|x|
holds.
Proof. Let t
0
T be chosen such that [x(t
0
)[ = |x|
. Let further I
:= (t
0
+
q
r (t
0
)B) T. For all t T we have with z(t) := |t
0
t|
in the natural order

|x|
[x[
I
+zL
(x)
I
.
We obtain |
I
|
()
=
1
1
(
1
(I
)
)
, and due to the monotonicity of the Luxemburg
norm |[x[
I
|
()
|x|
()
. By Lemma 2.5.17 we have: q
(t
0
) c
, hence
z
I
(q
(t
0
))
(c
, i.e. |z
I
|
()
(c
|
I
|
()
,
using again the monotonicity of the Luxemburg norm. We thus obtain
|x|

1
_
1
(I
)
_
|x|
()
+ (c
(x)
= |x|
()
+
_
1
_
1
_
1
_
|x|
()
+ (c
(x)
= |x|
()
+
(|x|
()
+c
(x)).
Apparently
_
T
_
x
|x|
1
_
1
(T)
__
d 1,
and hence the second part of the assertion.
Remark 2.5.22. If (1)
1
(T)
, then |x|
()
|x|
holds.
Lemma 2.5.23. Let (
k
)
kN
be a sequence of nite Young functions with
lim
k
k
(s) = for s > 1.
Then the solutions
k
of the equations s
r
k
(1 +s
) = 1 form a sequence tending

to zero.
Proof. Suppose this is not the case. Then there is a subsequence (
n
) and a positive
number with
n
for n = 1, 2, . . . .
For n large enough the inequality
n
(1+
)
2
r
would hold and hence
r
n
n
(1+
n
) 2, contradicting the denition of
n
.
Lemma 2.5.24. Let (
k
)
kN
as in the previous lemma and let in addition
lim
k
k
(s) = 0 for s (0, 1), then
lim
k
1
k
(a) = 1 for a > 0.
Proof. Let 0 < < 1, then for k large enough:
k
(1 +) > a, i.e. 1 + >
1
k
(a).
On the other hand one obtains for sufciently large k:
k
(1 ) < a and thus
1 <
1
k
(a).
We are now in the position to state the convergence estimate in a form suitable for
the regularisation method in Theorem 2.5.5.
Theorem 2.5.25. Let T be a compact subset of R
r
, which satises a cone condition,
the Lebesgue measure and (
k
)
kN
a sequence of nite Young functions with
lim
k
k
(s) =
_
for [s[ > 1
0 for [s[ < 1.
Let K be a nite-dimensional subset of C
(T), then there is a sequence (

k
)
kN
tending to zero, dened by
k
:= max(
k
,
k
), where
k
is the solution of the equation
s
r
k
(1 +s
) = 1 and
k
:=
1
1
k
(
1
(T)
)
1, and there is a number > 0, such that
[|x|
|x|
(
k
)
[ |x|
k
for all x K.
Proof. L
being a semi-norm is in particular convex and thus according to Theo-

rem 5.3.11 continuous on the span of K, i.e. there is a number , such that on the
span of K L
(x) |x|
.
Lemma 2.5.21 then yields
|x|
|x|
(
k
)

k
(
k
+ 1 +c
)|x|
.
The inequality claimed in the assertion of the theorem then follows using the preced-
ing lemmata.
Example 2.5.26. (a)
k
(s) := as
2
+k(k([s[ 1)
+
)
m
with a > 0 and m N.
For every c (0, 1) there is a k
0
N, such that for k N
c
k
q
<
k
<
1
k
q
with q :=
m+ 1
m+
r
.
Proof. Let p := q/, then because of pm(m+ 1) = pr:
_
1
k
p
_
r
k
_
1 +
1
k
p
_
=
1
k
pr
_
a
_
1 +
1
k
p
_
2
+k
_
k
1
k
p
_
m
_
=
1
k
pr
a
_
1 +
1
k
p
_
2
+ 1 > 1.
On the other hand
_
c
k
p
_
r
k
_
1 +
_
c
k
p
_
_
=
c
r
k
pr
_
a
_
1 +
c
k
p
_
2
+k
_
k
c
k
p
_
m
_
=
c
r
k
pr
a
_
1 +
c
k
p
_
2
+c
m+r
< 1
for k sufciently large, hence c
1
k
p
<
k
<
1
k
p
.
(b)
k
(s) := [s[
k
. For every c (0, 1) there is N N, such that for k N
c
r
log k
k

k

r
log(k 1)
k 1
.
Proof. For u > 0 we have: (1 +
u
k
)
k
e
u
(1 +
u
k
)
k+1
. Hence
__
r
log(k1)
k1
_1
_
r
_
1 +
r
log(k1)
k1
_
k
(k1)
r
_
r
log(k1)
_r
(k1)
k
.
On the other hand
__
c
r
log k
k
_1
_
r
_
1 +c
r
log k
k
_
k
k
(c1)
r
_
c
r
log k
_r
k
0.
This corresponds to the results of Peetre [90] for intervals and Hebden [37] for
cuboids in R
r
and continuously differentiable functions.
L
1
-norm
Lemma 2.5.27. Let (T, , ) be a nite measure space, a nite, denite Young
function, then we obtain for all x L
1
() L
()
|x|
1
|x|
()
+
(c
+ 1)|x|
1
,
where
denotes the larger solution of the equation

1
(s) =
s
1s
and c
is dened
by c
:=
1
1
(
1
(T)
)
.
Proof. Let
t
+
(0) denote the always existing right-sided derivative of at 0. If
t
+
(0) 1, then (s) [s[ on R and hence |x|
1
|x|
on L
(). The equation
1
(s) =
s
1s
only has the solution
= 0.
If now
t
+
(0) < 1, we dene h : R
+
R by h(s) :=
(s)
s
for s > 0 and h(0) :=
t
+
(0). h is continuous and monotonically increasing on R
+
(see Theorem 3.3.1). Let
further be g(t) := h(
1
(t)) for t 0, then g is also continuous and monotonically
increasing on R
+
and we have g(0) =
t
+
(0). Let nally j(t) := t + g(t) for t 0,
then j(0) < 1 and j(1) = 1 +h(
1
(1)) > 1, because
1
(1) > 0 and denite.
Due to the strict monotonicity of j there is a unique
with j(
) = 1.
Let now x L
1
() L
() 0, then we dene
x
:= [x[ +
|x|
1
.
In the natural order the following inequality holds:
x
|x|
1
;
for the integral norm we have
|x
|
1
= |x|
1
(1 +
(T)).
Using the monotonicity of g we then obtain
1 +
(T) =
_
T
x
|x|
1
d =
_
T
_
x
g(
x
|x|
1
)|x|
1
_
d
_
T
_
x
g(
)|x|
1
_
d.
The monotonicity of the Luxemburg norm then yields
g(
)|x|
1
|x
|
()
|x|
()
+
|x|
1
|
T
|
()
i.e.
|x|
1
|x|
()
+|x|
1
(1 g(
) +
).
The denition of
leads immediately to the assertion.

Lemma 2.5.28. Let X be a nite dimensional subspace of L
(), (T, , ) a nite

measure space, a nite Young function and chosen in such a way that | |

| |
1
on X, then for x X
|x|
()
|x|
1
+ (g
() 1) |x|
1
,
if g
(s) :=
s
1
(s)
for s > 0.
Proof. Let x X 0, then because of the monotonicity of g
1 =
_
T
[x[
|x|
1
d =
_
T
_
x
g
(
[x[
|x|
1
|x|
1
)
_
d
_
T
_
x
g
()|x|
1
_
d,
i.e. |x|
()|x|
1
.
Lemma 2.5.29. Let (
k
)
kN
be a sequence of nite, denite Young functions with
t
k+
(0) < 1. If lim
k
k
(s) = [s[ for s R and if
k
is the positive solution of the
equation
1
k
(s) =
s
1s
, then (
k
)
kN
and (g
k
(a) 1)
kN
for a > 0 are sequences
tending to zero.
Proof. Apparently lim
k
g
k
(a) = 1 for a > 0. Suppose there is a subsequence
(
n
) and a positive number with
n
, then due to the monotonicity of g
n
+g
n
(
n
) +g
n
()

2
+ 1
for n sufciently large, contradicting the denition of
n
.
We are now in the position to formulate the subsequent theorem:
Theorem 2.5.30. Let (T, , ) be a nite measure space, (
k
)
kN
a sequence of
nite, denite Young functions with lim
k
k
(s) = [s[ on R. Let further K be a
nite-dimensional subset of L
() and be chosen such that | |

| |
1
on the span of K, then there is a sequence (
k
)
kN
tending to zero dened
by
k
:= max((g
k
() 1),
k
), where
k
is the larger solution of
1
k
(s) =
s
1s
and
g
k
(s) :=
s
1
k
(s)
for s > 0, with the property
[|x|
1
|x|
k
[ |x|
1
k
for all x K.
Proof. This follows immediately from the preceding lemmata.
Remark 2.5.31. If in addition
k
(s) [s[ on R, then
[|x|
k
|x|
1
[ (1 +(T))|x|
1
k
for all x L
1
().
Examples 2.5.32. (a)
k
(s) := [s[
1+
1
k
Depending on the choice of c(0, 1) we have for sufciently large k
c
log k
k

k

log k
k
.
Furthermore we obtain for > 1: g
k
() 1
1
k+1
.
(b)
k
(s) := [s[
1
k
log(1 +k[s[) we have for sufciently large k:
k
(
log k
k
)
1
2
.
For modulars the following estimate is available:
Theorem2.5.33. Let (T, , ) be a nite measure space, (
k
)
kN
a sequence of nite
Young functions with lim
k
k
(s) = [s[ on R.
Let further K be a nite-dimensional subset of L
(), let x
0
K arbitrary and
Q := x L
[ |x|
1
2|x
0
|
1
.
Then Q is a compact neighborhood of M(| |
1
, K) and we have
sup
xQ
[|x|
1
f
k
(x)[ (T) sup
s[0,ac]
[s
k
(s)[
with a := 2|x
0
|
1
and c chosen, such that | |
c| |
1
on the span of K.
Proof. The statement about Q is obvious. Furthermore we have
sup
xQ
[|x|
1
f
k
(x)[ sup
xQ
_
T
[ [x[
k
(x)[d
sup
xQ
(T) sup
s[0,|x|
]
[s
k
(s)[
(T) sup
s[0,ac]
[s
k
(s)[.
The right-hand side is, according to Theorems 5.3.6 and 5.3.13, a sequence tending to
zero.
Examples 2.5.34. We denote sup
s[0,ac]
[s
k
(s)[ by
k
, then we obtain for
(a)
k
(s) := [s[
1+
1
k
the following inequality:
k
max(1, ac(ac 1))
1
k
(b) for
k
(s) := [s[
1
k
log(1 +k[s[) the inequality
k
2
log k
k
for k ac + 1
(c) and for
k
(s) :=
2
(s arctan(ks)
1
2k
log(1 +k
2
s
2
)) nally
k
3
log k
k
for k ac + 1.
2.6 A PolyaRemez Algorithm in C(T)
We will now take up the approximation problems treated in earlier sections for nite-
dimensional subspaces of continuous functions on an arbitrary compact metric space
T (mostly as a subset of R
l
, with l N):
For a given point x in C(T) we look for an element in a certain subset M of C(T)
that among all points of M has the least distance from x, where the distance is
understood in the following sense:
|x y|
,T
= max
tT
[x(t) y(t)[. (2.6)
Denoting the norm | |
,T
with subscript T is meant to express the dependence on
the set T, which we intend to vary in the sequel. In analogy to the maximum norm we
will use this type of notation also for the Luxemburg norm | |
(),T
.
The following algorithmic considerations are based on the theorem of de la Valle-
Poussin proved in Chapter 1. For convenience we restate this theorem once again:
Theorem 2.6.1 (Theorem of de la Valle-Poussin). Let v
0
be a best Chebyshev ap-
proximation of x C(T) w.r.t. the n-dimensional subspace V of C(T). Then there
is a nite (non-empty) subset T
0
of the extreme points E(x v
0
, T), which does not
contain more than n+1 points and for which v
0T
0
is a best approximation of x
T
0
w.r.t.
V
T
0
, i.e. for all v V we have
|x v
0
|
,T
= |x v
0
|
,T
0
|x v|
,T
0
.
This theorem is, as was seen above, a consequence of the Characterization Theorem
(see Theorem 1.3.3), which in the sequel will facilitate the constructive choice of a de
la Valle-Poussin set in the sense of the above theorem.
Section 2.6 A PolyaRemez Algorithm in C(T) 59
Theorem 2.6.2 (Characterization Theorem). Let V = spanv
1
, . . . , v
n
be a sub-
space of C(T) and x C(T). v
0
is a best Chebyshev approximation of x w.r.t. V ,
if and only if q points t
1
, . . . , t
q
E(x v
0
, T) with 1 q n + 1 and q positive
numbers
1
, . . . ,
q
exist, such that for all v V
q
j=1
j
(x(t
j
) v
0
(t
j
))v(t
j
) = 0.
The interpretation of the subsequent theorem is the following: when varying the
set T the Kuratowski convergence is responsible for the convergence of the corres-
ponding maximum norms.
Denition 2.6.3. Let X be a metric space and (M
n
)
nN
a sequence of subsets of X.
Then
lim
n
M
n
:=
_
x X[ there is a subsequence (M
k
)
kN
of (M
n
)
nN
and x
k
M
k
with x = lim
k
x
k
_
lim
n
M
n
:=
_
x X[ there is a n
0
N and x
n
M
n
for n n
0
with x = lim
n
x
n
_
.
The sequence (M
n
)
nN
is said to be Kuratowski convergent to the subset M of X
if
lim
n
M
n
= lim
n
M
n
= M,
notation: M = lim
n
M
n
.
Theorem 2.6.4. Let T be a compact metric space and x C(T). Let further (T
k
)
kN
be a sequence of compact subsets of T, which converges in the sense of Kuratowski to
T
0
. Then we obtain
lim
k
|x|
,T
k
= |x|
,T
0
,
i.e. the sequence of the corresponding maximum norms converges pointwise on C(T).
Proof. Let
0
T
0
be chosen such that [x(
0
)[ = |x|
,T
0
. From the Kuratowski
convergence follows the existence of a sequence (t
k
)
kN
with t
k
T
k
for all k n
0
,
which converges to
0
. From the continuity of x and t
k
T
k
T it then follows that
|x|
,T
|x|
,T
k
[x(t
k
)[
k
[x(
0
)[ = |x|
,T
0
.
Let on the other hand
k
T
k
with [x(
k
)[ = |x|
,T
k
and s := sup[x(
k
)[. Then
there is a subsequence (
k
m
)
mN
converging to a T with lim
m
[x(
k
m
)[ = s.
From the continuity of x it follows that also lim
m
[x(
k
m
)[ = [x()[. The Kura-
towski convergence guarantees T
0
and thus
|x|
,T
0
s |x|
,T
k
for all k N.
This theorem provides the opportunity to develop strategies for the computation of
best Chebyshev approximations for functions being dened on a continuous multi-
dimensional domain. In addition to an obvious approach to use a successively ner
point grid (mesh width to zero), we have the possibility to keep the number of dis-
cretization points small, where we just have to take care that the sequence of dis-
cretization sets converges in the sense of Kuratowski to a superset of the critical points
which result from the theorem of de la Valle-Poussin.
Using the Stability Theorem 5.3.25 we then obtain
Theorem 2.6.5. Let M be a closed convex subset of a nite-dimensional subspace of
C(T) and x C(T). Let T
d
be a nite subset of T with the property that the best
Chebyshev approximations w.r.t. | |
,T
d
and | |
,T
agree as functions on T. Let
now (T
k
)
kN
be a sequence of compact subsets of T which converges in the sense of
Kuratowski to a superset T
0
of T
d
. Then for K := x M we have
(a) lim
k
M(| |
,T
k
, K) is non-empty and
kN
M(| |
,T
k
, K) is bounded
(b) lim
k
M(| |
,T
k
, K) M(| |
,T
, K)
(c) inf
yK
|y|
,T
k

k
inf
yK
|y|
,T
(d) y
k
M(| |
,T
k
, K) implies |y
k
|
,T
inf
yK
|y|
,T
.
If one wants to realize the scheme of the previous theorem algorithmically, an ob-
vious approach is to compute each of the best approximations w.r.t. | |
,T
k
by a
corresponding discrete Polya algorithm, at least in an approximative sense. If, beyond
that we are able to keep the number of elements in the set sequence (T
k
) uniformly
bounded, even a diagonal procedure is feasible, a process that is put into precise terms
in the two subsequent theorems.
Theorem 2.6.6. Let T be a compact metric space and x C(T). Let further
(T
k
)
kN
0
be a sequence of nite subsets of T, whose number of elements does not
exceed a given constant and converges in the sense of Kuratowski to T
0
.
Let further (
k
)
kN
be a sequence of Young functions, which converges pointwise
to the Young function
(t) :=
_
0 for [t[ 1
otherwise
for [t[ ,= 1. Then we obtain
lim
k
|x|
(
k
),T
k
= |x|
,T
0
,
i.e. the sequence of the corresponding Luxemburg norms converges pointwise on
C(T).
Proof. We obtain
[|x|
(
k
),T
k
|x|
,T
0
[ [|x|
(
k
),T
k
|x|
,T
k
[ +[|x|
,T
k
|x|
,T
0
[.
According to Theorem 2.5.13 we have
[|x|
(
k
),T
k
|x|
,T
k
[
k
|x|
,T
k
with
k
= max
_
1
1
k
(
1
[T
k
[
)
1, 1
1
1
k
(1)
_
.
By assumption there is a N N with the property [T
k
[ N. Then, due to Theo-
rem 2.5.13 the sequence (
k
)
kN
tends to zero. Due to the convergence of the Cheby-
shev norms the assertion of the theorem follows.
Under the conditions stated above we can now approximate the best continuous
Chebyshev approximation by a sequence of best discrete approximations w.r.t. Lux-
emburg norms in the sense of Polya.
Theorem 2.6.7. Let M be a closed convex subset of a nite-dimensional subspace of
C(T) and x C(T). Let T
d
be a nite subset of T with the property that the best
Chebyshev approximations w.r.t. | |
,T
d
and | |
,T
agree as functions on T. Let
now (T
k
)
kN
be a sequence of nite subsets of T, whose number of elements does not
exceed a given constant and converges in the sense of Kuratowski to a superset T
0
of
T
d
. Let further (
k
)
kN
be a sequence of Young functions, which converges pointwise
to the Young function
. Then we obtain for K := x M

(a) lim
k
M(| |
(
k
),T
k
, K) is non-empty and
kN
M(| |
(
k
),T
k
, K) is bounded
(b) lim
k
M(| |
(
k
),T
k
, K) M(| |
,T
, K)
(c) inf
yK
|y|
(
k
),T
k

k
inf
yK
|y|
,T
(d) from y
n
M(| |
(
k
),T
k
, K) it follows that |y
k
|
,T
inf
yK
|y|
,T
.
Algorithmic Scheme
Let T be a compact metric space and K a closed convex subset of an n-dimensional
subspace of C(T). Let further 0 < < 1, > 0 and T
1
T.
(a) Set k := 1, D
1
:= T
1
. Determine x
1
M(| |
,D
1
, K) and set
1
:=
|x
1
|
,T
|x
1
|
,D
1
1.
(b) If
k
< , then Stop.
(c) Reduction Step:
Determine a subset

D
k
of E(x
k
, D
k
) where for the number of elements we have
[

D
k
[ n+1, such that x
k
M(| |
,

D
k
, K) (i.e.

D
k
is a de la Valle-Poussin
set relative to D
k
)
(d) Inner Loop:
i. Choose t
k
E(x
k
, T) and set i := 1. Choose B
1,k
as a superset of
D
k
t
k
.
ii. Determine y
i,k
M(| |
,B
i,k
, K) and set
i,k
:=
|y
i,k
|
,T
|y
i,k
|
,B
i,k
1.
iii. If
i,k

k
, then set D
k+1
:= B
i,k
, x
k+1
:= y
i,k
,
k+1
:=
i,k
.
Set k := k + 1, goto (b).
Otherwise choose T
i
T and set B
i+1,k
:= B
i,k
T
i
.
Set i := i + 1 goto ii.
Algorithmic Variants
For the choice of the set T
i
in step iii. of the inner loop different strategies are possible:
(a) (static variant) Let (T
i
)
iN
be a sequence of subsets of T that converges in the
sense of Kuratowski to T.
(b) (dynamic variant) Choose T
i
as a non-empty subset of E(y
i,k
, T). If one
chooses in particular T
i
as a set containing just a single point, then this pro-
cedure corresponds in the inner loop to the First Algorithm of Remez proposed
by Cheney (see [22]).
One can view the determination of T
i
as a byproduct of the computation of the
i,k
.
(c) (combined variant) A mixture of both variants in the sense of a union of the
corresponding sets.
Theorem 2.6.8. In variants (a) and (c) the sequence (x
k
)
kN
generated by the algo-
rithm is bounded and every point of accumulation is a best Chebyshev approximation
w.r.t. | |
,T
. Furthermore the sequence (|x
k
|
,D
k
)
kN
converges monotonically
increasing to inf(| |
,T
, K).
Proof. By construction the inner loop over i ends after a nite number of steps. Fur-
thermore, by construction the sequence (
k
) tends to zero and we obtain for k suf-
ciently large
|x
k
|
,T
< (1 +)|x
k
|
,D
k
,
where x
k
M(| |
,D
k
, K). Let x
0
M(| |
,T
, K), then apparently
|x
k
|
,D
k
|x
0
|
,D
k
|x
0
|
,T
.
Hence the sequence (x
k
) is bounded. Let now (x
k
n
) be a convergent subsequence
with x
k
n
x. Suppose x / M(| |
,T
, K), then there is a

t T and > 0 with
[ x(
t)[ |x
0
|
,T
(1 + 2), due to x
k
n
(
t) x(
t) also [x
k
n
(
t)[ |x
0
|
,T
(1 + )
for n sufciently large. On the other hand there is a N N with
k

2
for k > N.
Putting these results together we obtain
|x
0
|
,T
(1 +) [ x
k
n
(
t)[ |x
k
n
|
,T
_
1 +

2
_
|x
k
n
|
,D
k
n

_
1 +

2
_
|x
0
|
,T
,
and hence a contradiction.
The monotonicity follows from D
k+1

D
k
, since
|x
k
|
,D
k
= |x
k
|
,

D
k
|x
k+1
|
,D
k+1
.
Algorithm for the Linear Approximation
Let T be a compact metric space and v
1
, . . . , v
n
a basis of a nite-dimensional
subspace V of C(T). Let T
0
be a subset of T such that v
1
, . . . , v
n
restricted to
T
0
are linearly independent. In order to guarantee uniqueness of the coefcients of
the best approximations to be determined, one can in step i. of the inner loop put
B
1,k
:=

D
k
t
k
T
0
.
The reduction step in (c) can now be realized in the following way: Let x v
0,k
be
a best approximation w.r.t. | |
,D
k
and let t
1
, . . . , t
l
= E(xv
0,k
, D
k
) be the set
of extreme points of x v
0,k
on D
k
. If l n + 1, put

D
k
= E(x v
0,k
, D
k
).
Otherwise determine a solution of the homogeneous linear system of equations
resulting from the characterization theorem
l
j=1
j
(x(t
j
) v
0,k
(t
j
))v
i
(t
j
) = 0 for i = 1, . . . , n.
Here we look for a non-trivial and non-negative solution with at most n + 1 non-zero
components. The non-triviality of the solution can in particular be guaranteed by the
requirement
l
j=1
j
= 1.
For non-negative solutions special algorithms are required.
Remark 2.6.9. A further class of algorithms for the computation of a best (contin-
uous) Chebyshev approximation in C(T), also being based on the idea of the Polya
algorithm, is obtained in the following way:
After choosing a sequence of Young functions (
k
), whose corresponding Luxem-
burg norms (| |
k
,T
) converge to | |
,T
, the computation of a best approximation
w.r.t. the Luxemburg norm is performed by replacing the integrals by quadrature for-
mulas. In order to keep the number of discretization points low, one can adopt the
idea of adaptive quadrature formulas (sparse grids).
2.7 Semi-innite Optimization Problems
Various problems of optimization theory can be formulated as problems of linear
semi-innite optimization (see [71]), where a linear cost function is optimized under
(potentially) innitely many linear restrictions.
Let T be a compact metric space and C(T) be equipped with the natural order,
U an n-dimensional subspace of C(T) with basis u
1
, . . . , u
n
and let u
n+1
C(T).
Let now c R
n
be given, then we denote the following problem
sup
xR
n
_
c, x
i=1
x
i
u
i
u
n+1
_
(2.7)
as a semi-innite optimization problem.
If u
n+1
+ U satises a Krein condition, i.e. there is a u u
n+1
+ U and a > 0
with u(t) for all t T, then we can, without loss of generality, assume that u
n+1
is positive.
In [66] we have converted the semi-innite optimization problem into a linear ap-
proximation problem w.r.t. the max-function, and have approximated the non-differ-
entiable max-function by a sequence of Minkowski functionals for non-symmetric
Young functions, much in the spirit of the Polya algorithms of this chapter. In the next
section of the present text we will pursue a different approach: instead of smoothing
the function to be minimized we smooth the restriction set and keep the (linear) cost
function xed.
2.7.1 Successive Approximation of the Restriction Set
We will now treat the problem of linear semi-innite optimization, directly by the
method of Lagrange. Let p : C(T) R with p(u) := max
tT
u(t), then we can
formulate the above problem as a restricted convex minimization problem:
inf
xR
n
_
c, x
p
_
n
i=1
x
i
u
i
u
n+1
_
0
_
. (2.8)
Section 2.7 Semi-innite Optimization Problems 65
The convex function p is not differentiable in general. This also holds for the function
f : R
n
R, dened by f(x) := p(
n
i=1
x
i
u
i
u
n+1
). Thus the level set S
f
(0),
which describes the restriction set, is not at convex (see Theorem 8.1.2).
An equivalent formulation we obtain by using the also convex function q : C(T)
R dened by q(u) := max(p(u), 0)
inf
xR
n
_
c, x
q
_
n
i=1
x
i
u
i
u
n+1
_
= 0
_
. (2.9)
We obtain for the corresponding Lagrange function : c, x +f(x) resp. c, x +
g(x), where g : R
n
R is dened by g(x) := q(
n
i=1
x
i
u
i
u
n+1
). The La-
grange functions are, as stated above, not differentiable. By suitable differentiable
approximations of the functions f resp. g we obtain differentiable restricted convex
optimization problems. If we apply to these the methods of KuhnTucker resp. of
Lagrange, one can reduce the determination of minimal solutions to the solution of
non-linear equations. In order to apply rapidly convergent numerical methods, the
regularity of the corresponding Jacobian matrix at the solution is required. A state-
ment in this direction is provided by the following lemmata:
Lemma 2.7.1. Let A L(R
n
) be symmetric and positive denite and let b R
n
be
different from zero. Then the matrix
_
A b
b
T
0
_
is non-singular.
Proof. Suppose there is a y R
n
and a R, not both of them zero, with Ay = b
and b, y = 0, then
Ay, y = b, y = 0
holds. Since A is positive denite, y = 0 follows, hence ,= 0 and therefore b = 0 a
contradiction.
Lemma 2.7.2. Let f : R
n
R be twice continuously differentiable, let c R
n
dif-
ferent from zero, let (x
) be a stationary point of the Lagrange function (x, )

c, x + f(x). If f
tt
(x
) is positive denite, the Hessian matrix of the Lagrange

function
_
f
tt
(x) f
t
(x)
f
t
(x)
T
0
_
is non-singular at (x
).
Proof. If (x
) is a stationary point, it satises the equations

c +
f
t
(x
) = 0
f(x
) = 0.
Suppose
= 0 or f
t
(x
) = 0, then c = 0, a contradiction. According to the previous

lemma then
_
f
tt
(x
)
1
f
t
(x
)
1
f
t
(x
)
T
0
_
is non-singular.
Theorem 2.7.3. Let f : R
n
R be convex and differentiable, let c R
n
be different
from zero and let > inf f(R
n
). Let further
0
= infc, x [ f(x) (2.10)
be nite and be attained at x
. Then there is a
> 0, such that
0
= infc, x +
(f(x) ), (2.11)
where the inmum is also attained in (2.11), and f(x
) = holds.
Proof. Apparently there is a x
1
R
n
with f(x
1
) < . Then by Theorem 3.14.4
there is a
0 with
0
= infc, x +
(f(x) ), where the inmum is also

attained at x
. As a necessary (and sufcient) condition we obtain

c +
f
t
(x
) = 0.
Since c ,= 0 the gradient f
t
(x
) ,= 0 and
> 0. Apparently x
cannot be an interior
point of S
f
().
Let be a (non-symmetric) twice continuously differentiable Young function. Let
further p
be the corresponding Minkowski functional and let f
: R
n
R be
dened by
f
(x) := p
_
n
i=1
x
i
u
i
u
n+1
_
.
Let > p
(u
n+1
). Then we consider the problem
inf
xR
n
_
c, x
_
n
i=1
x
i
u
i
u
n+1
_

_
(2.12)
resp.
inf
xR
n
c, x [ f
(x) . (2.13)
For the corresponding Lagrange function we have
(x, ) c, x +(f
(x) ).
A stationary point (x
) satises the following system of equations:

c +
f
t
(x
) = 0 (2.14)
f
(x
) = . (2.15)
A more explicit form of the second equation is: p
n
i=1
x
i
u
i
u
n+1
) = . Due to
the positive homogeneity of p
_
T
_
n
i=1
x
i
u
i
u
n+1
_
d = 1
holds. Altogether we obtain the following system of equations:
c +

__
T
u
j
t
_
n
i=1
x
i
u
i
u
n+1
_
d
_
n
j=1
= 0 (2.16)
_
T
_
n
i=1
x
i
u
i
u
n+1
_
d = 1. (2.17)
Here

=
/
_
T
y(t)
p
(y)
t
(
y(t)
p
(y)
)d with y(t) :=
n
i=1
x
i
u
i
u
n+1
, where the de-
nominator of the right-hand side corresponds to the expression (a) in Section 1.4.2.
Therefore also

> 0.
Let now in particular
0
: R R be dened by
0
(s) := s
2
and let be a twice
continuously differentiable (non-symmetric) Young function with the properties
(a) (s) = 0 for s 1
(b) (s) > 0 for s > 1.
Moreover, let > 0 and let
:=
0
+ , and let p
,
be the corresponding
Minkowski functional. Let f
,
: R
n
R be dened by
f
,
(x) := p
,
_
n
i=1
x
i
u
i
u
n+1
_
.
Due to the Krein condition we can assume u
n+1
to be non-negative, then, due to
p
,
(u
n+1
) =
|u
n+1
|
2
,
(since (u
n+1
) = 0) an explicit relation between and can be stated as
>
|u
n+1
|
2
.
A stationary point (x
) of the Lagrange function of the approximating problem

inf
xR
n
_
c, x
p
,
_
n
i=1
x
i
u
i
u
n+1
_

_
(2.18)
satises
c +

__
T
u
j
_
n
i=1
x
i
u
i
u
n+1
_
d
_
n
j=1
= 0 (2.19)
_
T
_
n
i=1
x
i
u
i
u
n+1
_
d = 1. (2.20)
Since
tt
> 0 we obtain in analogy to the considerations in Section 1.4.2 that of f

tt
,
is positive denite.
Let now (
k
)
N
be a sequence of positive numbers tending to zero. At rst we will
show the Kuratowski convergence of the level sets S
p
0,
(
k
) (here = 0) to S
p
(0).
Lemma 2.7.4. We have
(a) S
p
0,
(0) = S
p
(0)
(b) S
p
0,
(
k
) S
p
(0).
Proof. Let y C(T), then by denition
p
0,
(y) = inf
_
c
_
T
_
y
c
_
d 1
_
.
Now for k 1 due to the convexity of : (s) = (
1
k
ks + (1
1
k
) 0)
1
k
(ks)
and hence k(s) (ks). Let now y(t) > > 0 for t U with (U) > 0 and let
c > 0 be chosen, such that /2c 1. Then
_
T
_
y(t)
c
_
d (U)
_
c
_
= (U)
_

2
c
2
_
(U)

2c
(2)
holds. For a c > 0 small enough the right-hand side is greater than 1 and hence
p
0,
(y) > 0. If on the other hand y 0, then apparently p
0,
(y) = 0. This yields the
rst part of the assertion. The second part is an immediate consequence of the rst
part.
k
)
kN
be a sequence of positive numbers tending to zero. Then
the sequence of Minkowski functionals (p
k
,
)
kN
converges pointwise to p
0,
on
L
().
Proof. Let y =
n
i=1
T
i
be a step function in L
().
Case 1: p
0,
(y) > 0.
We have
n
i=1
k
_

i
p
0,
(y)
_
(T
i
)
n
i=1
_

i
p
0,
(y)
_
(T
i
) = 1,
and hence
p
k
(y) p
0,
(y).
Suppose, there is a subsequence and a positive with p
k
(y) p
0,
(y) +, then
n
i=1
k
_

i
p
0,
(y) +
_
(T
i
)
n
i=1
k
_

i
p
k
,
(y)
_
(T
i
) = 1.
Due to the pointwise convergence of the Young functions we then obtain
n
i=1
_

i
p
0,
(y) +
_
(T
i
) 1,
a contradiction.
Case 2: p
0,
(y) = 0.
Then by the above lemma y 0 and hence (
y
c
) = 0 for all c > 0. Therefore
p
k
,
(y) =

k
|y|
2
0 for k .
In a manner similar to Theorem 2.4.2 the continuity of the Minkowski functionals
on L
can be established. The remaining part is an application of the theorem of

BanachSteinhaus 5.3.17 for convex functions.
In order to be able to apply the stability theorems of Chapter 5, we need the Kura-
towski convergence of the restriction sets. This can be achieved, if the sequence of the
parameters (
k
)
kN
converges more rapidly to zero than the sequence of the squares
of the levels (
k
)
kN
.
Theorem 2.7.6. Let
k
= o(
2
k
) and let M be a closed subset of C(T). Then the
sequence of the intersects (M S
p
k
,
(
k
))
kN
converges in the sense of Kuratowski
to the intersect with the negative natural cone M S
p
(0).
Proof. Let y
k
M S
p
k
,
(
k
) for kN, i.e. p
k
,
(y
k
)
k
and let lim
k
y
k
= y.
Due to the closedness of M we have y M. We have to show that y S
p
(0).
Pointwise convergence of the Minkowski functionals implies their continuous con-
vergence (see Remark 5.3.16), i.e.
p
k
,
(y
k
)
k
p
0,
( y).
Let > 0 be given, then: p
k
,
(y
k
)
k
for k sufciently large, and hence
p
0,
( y) .
Conversely, let
k
= o(
2
k
) and let y M S
p
(0) (i.e. in particular y 0), then
p
k
,
( y) = inf
_
c
_
T
k
_
y
c
_
d 1
_
= inf
_
c
k
_
T
_
y
c
_
2
d 1
_
=

k
| y|
2
holds, but

k
| y|
2

k
for k sufciently large, hence y M S
p
k
,
(
k
) for k
sufciently large.
For the case of semi-innite optimization we choose M := u
n+1
+ U, where
U := spanu
1
, . . . , u
n
.
Remark. If u
1
, . . . , u
n
is a basis of U, then the Kuratowski convergence of the
level sets of the Minkowski functionals on C(T) carries over to the Kuratowski con-
vergence of the corresponding coefcient sets
lim
k
f
k
,
(
k
) = S
f
(0).
Theorem 2.7.7. Let S
k
:= f
k
,
(
k
) and S := S
f
(0). Let further the set of minimal
solutions of c, on S be non-empty and bounded. Then
(a) for every k N there exists a uniquely determined minimal solution x
k

M(c, , S
k
). The resulting sequence is bounded and its points of accumula-
tion are elements of M(c, , S).
(b) infc, S
k
infc, S.
(c) x
k
M(c, , S
k
) for k N implies c, x
k
infc, S.
Proof. Since for all k Nthe level sets S
k
are strictly convex and bounded, existence
and uniqueness of the minimal solutions of c, on S
k
follows. The remaining
assertions follow from the Stability Theorem of Convex Optimization 5.3.21 in R
n
.
Outer Regularization of the Linear Functional
Theorem 2.7.8. Let f and g
be convex functions in C
2
(R
n
) with g
tt
positive denite.
Let c R
n
be different from zero and let > inf f(R
n
). Then:
(a) The optimization problem (g
() c, , S
f
()) has a unique solution x
.
(b) There is a non-negative Lagrange multiplier
, such that the Lagrange function

L
= g
+c, +
(f )
has x
as minimal solution.
(c) If the solution x
is attained at the boundary of the restriction set S

f
(), then
the pair (x
) is a solution of the following systems of equations:

g
t
(x) c +f
t
(x) = 0
f(x) = .
The corresponding Jacobian matrix at (x
)
_
g
tt
(x
) +
f
tt
(x
) f
t
(x
)
f
t
(x
)
T
0
_
is then non-singular.
Proof. Since > inf f(R
n
) and f(x
) = we have f
t
(x
) ,= 0. As
due to
Theorem 3.14.4 is non-negative, the matrix g
tt
(x
) +
f
tt
(x
) is positive denite.
The remaining part of the assertion follows by Lemma 2.7.1.
Remark. The condition required in (c) that x
should be a boundary point of S

f
(),
is satised, if the global minimal solution of g
c, is outside of the level set

S
f
(). This suggests the following strategy: if g
(x) = |x x
|
2
with a suitable
x
R
n
, then the global minimal solution x
of g
c, satises the equation

x
= x
+
1
2
c.
If f(
1
2
c) > , choose x
= 0. If on the other hand a point x is known (normally

the predecessor in an iteration process) with f( x) > , put x
:= x
1
2
c and hence
x
= x.
Remark (Partial result for second stage). Let S
k
:= S
f
(
k
), S := S
f
(0) with
f(x) = p
n
i=1
x
i
u
i
u
n+1
). Let further be x
k
M(c, , S
k
), let (
k
) be
a sequence of positive numbers tending to zero and h
k
:=
k
g c, , then we obtain
for x
k
minimal solution of h
k
on S
k
h
k
(x
k
) =
k
g(x
k
) c, x
k

k
g( x
k
) c, x
k

k
g( x
k
) c, x
k
,
and hence
g(x
k
) g( x
k
) 0.
Every point of accumulation x of ( x
k
) resp. x of (x
k
) is a minimal solution of c,
on S and
g( x) g( x).
Chapter 3
Convex Sets and Convex Functions
3.1 Geometry of Convex Sets
In this section we will present a collection of important notions and facts about the
geometry of convex sets in vector spaces that we will need for the treatment of opti-
mization problems in Orlicz spaces.
Denition 3.1.1. Let X be a vector space and let x, y X. Then we call the set
[x, y] := z X[ z = x + (1 )y with 0 1
the closed interval connecting x and y. By the open interval connecting x and y we
understand the set
(x, y) := z X[ z = x + (1 )y with 0 < < 1.
Half open intervals (x, y] and [x, y) are dened correspondingly.
Denition 3.1.2. A subset K of X is called convex, if for all points x, y K the
interval [x, y] is in K.
Denition 3.1.3. Let A be a subset of X. A point x X is called a convex com-
bination of elements of A, if there is n N and x
1
, . . . , x
n
A, such that x =
1
x
1
+ +
n
x
n
, where
1
, . . . ,
n
non-negative numbers with
1
+ +
n
= 1.
Using complete induction we obtain the following
Remark 3.1.4. Let K be a convex subset of X, then every convex combination of a
nite number of points in K is again an element of K
1
x
1
+ +
n
x
n
K.
Denition 3.1.5. Let U be a subset of X. A point x
0
U is called algebraically
interior point of U, if for every y X there is R
>0
with [x
0
y, x
0
+y] U.
U is called algebraically open, if every point in U is an algebraically interior point.
Section 3.1 Geometry of Convex Sets 73
Linearly Bounded Sets
Denition 3.1.6. A subset S of a vector space is called linearly bounded w.r.t. a point
x
0
S, if for all y S the set
R
0
[ (1 )x
0
+y S
is bounded. S is called linearly bounded, if S is linearly bounded w.r.t. a point x
0
S.
Theorem 3.1.7. A convex, closed, and linearly bounded subset K of R
n
is bounded.
Proof. W.l.g. let K be linearly bounded w.r.t. 0. Suppose, K is not bounded. Then
there is a sequence (x
n
)
1
in K with |x
n
| .
For large n the point s
n
:=
x
n
|x
n
|
= (1
1
|x
n
|
) 0 +
1
|x
n
|
x
n
is in K. Let (s
n
i
)
be a subsequence of (s
n
) converging to s. Since K is linearly bounded, there is a
0
R
+
with
0
s / K. For large n however,
0
s
n
K and, because K is closed,
a contradiction follows.
The Interior and the Closure of Convex Sets
In normed spaces the convexity of a subset carries over to the interior and the closure
of these subsets.
Theorem 3.1.8. Let K be a convex subset of X. Then we have:
(a) For all x in the interior Int(K) of K and all y K the half open interval [x, y)
is in Int(K).
(b) The interior of K and the closure of K are convex.
Proof. (a): Let x Int(K) and let U be a neighborhood of 0, such that x + U K.
Let (0, 1), then U is also a neighborhood of 0, and we have
x + (1 )y +U = (x +U) + (1 )y K.
(b): That the interior of K is convex, follows directly from (a), if one observes that
for all x, y K: [x, y] = x (x, y) y.
That the closure of K is convex, is obtained from the following line of reasoning:
let x, y K and (x
k
)
1
, (y
k
)
1
sequences in K with x = lim
k
x
k
and y = lim
k
y
k
.
For a [0, 1] and all k N we have: (x
k
+ (1 )y
k
) K and thus
x + (1 )y = lim
k
(x
k
+ (1 )y
k
) K.
74 Chapter 3 Convex Sets and Convex Functions
For later use we need the following
Lemma 3.1.9. Let S be a non-empty convex subset of a nite-dimensional normed
space X. Let x
0
S and X
m
:= span(S x
0
) and let dim(X
m
) = m > 0. Then
Int(S) ,= w.r.t. X
m
.
Proof. W.l.g. let 0 S. Let x
1
, . . . , x
m
S be linearly independent, then s
i
=
x
i
|x
i
|
,
i =1, . . . , n, is a basis of X
m
. Then t :=
1
2
(
1
m
m
i=1
x
i
) +
1
2
0 =
1
2m
m
i=1
x
i
S.
We show: t Int(S) w.r.t. X
m
: let x X
m
, then we have the following rep-
resentation: x =
m
i=1
i
s
i
. Apparently by

m
i=1
[
i
[ a norm is dened on X
m
:
|x|
m
:=
m
i=1
[
i
[. On X
m
this norm is equivalent to the given norm. Let now be
> 0 and U
:= u X
m
[ |u|
m
< , then for small enough t + u S for all
u U
, because
t +u =
1
2m
m
i=1
x
i
+
m
i=1
u
i
s
i
=
1
m
m
i=1
_
1
2
+m
u
i
|x
i
|
_
x
i
.
For small enough we have 0 < t
i
:=
1
2
+ m
u
i
|x
i
|
< 1 and hence t + u S since
0 S.
Extreme and Flat Points of Convex Sets
Let X be a normed space.
Denition 3.1.10. Let K be a convex subset of X. A point x K is called extreme
point of K, if there is no proper open interval in K which contains x, i.e. x is extreme
point of K, if x (u, v) K implies u = v. By E
p
(K) we denote the set of extreme
points of K.
In nite-dimensional spaces the theorem of Minkowski holds, which we will prove
in the section on Separation Theorems (see Theorem 3.9.12).
Theorem 3.1.11 (Theorem of Minkowski). Let X be an n-dimensional space and
S a convex compact subset of X. Then every boundary point (or arbitrary point) of
S can be represented as a convex combination of at most n (or n + 1 respectively)
extreme points.
Denition 3.1.12. A convex set K is called convex body, if the interior of K is non-
empty.
Denition 3.1.13. Let K be a convex body in X. K is called strictly convex, if every
boundary point of K is an extreme point.
Section 3.1 Geometry of Convex Sets 75
Denition 3.1.14. Let K be a convex subset of X. A closed hyperplane H is called
supporting hyperplane of K, if H contains at least one point of K and K is completely
contained in one of the two half-spaces of X.
A notion corresponding to that of an extreme point in a dual sense we obtain in
the following way:
Denition 3.1.15. A boundary point x of a convex subset K of X is called a at point
of K, if at most one supporting hyperplane passes through x.
Denition 3.1.16. A convex body K in X is called at convex, if the boundary of K
only consists of at points.
Convex Hull
Denition 3.1.17. Let A be a subset of X. Then by the term convex hull of A we
understand the set
conv(A) :=
K[ A K X where K is convex.
Remark 3.1.18. Let A be a subset of X. Then the convex hull of A is the set of all
convex combinations of elements in A.
Theorem 3.1.19 (Theorem of Carathodory). Let A be a subset of a nite-dimen-
sional vector space X and n = dim(X). Then for every element x conv(A) there
are n + 1 numbers
1
, . . . ,
n+1
[0, 1] and n + 1 points x
1
, . . . , x
n+1
A with
1
+ +
n+1
= 1 and x =
1
x
1
+ +
n+1
x
n+1
.
The theorem of Carathodory states that, for the representation of an element of the
convex hull of a subset of an n-dimensional vector space, n + 1 points of this subset
are sufcient.
Proof. Let x conv(A). Then there are m N, real numbers
1
. . .
m
[0, 1], and
points x
1
, . . . , x
m
A with
1
+ +
m
= 1 and x =
1
x
1
+ +
m
x
m
.
If m n + 1, nothing is left to be shown.
If m > n + 1, it has to be demonstrated that x can already be represented as a
convex combination of m1 points in A, whence the assertion nally follows.
For all k 1, . . . , m1 let y
k
:= x
k
x
m
. Since m > n+1 and n = dim(X),
the set y
1
, . . . , y
m1
is linearly dependent: hence there is a non-trivial (m1)-tuple
of numbers (
1
, . . . ,
m1
) such that
1
y
1
+ +
m1
y
m1
= 0.
If we put
m
:= (
1
+ +
m1
), then
1
+ +
m
= 0 and
1
x
1
+ +
m
x
m
= 0. (3.1)
Since not all numbers
1
, . . . ,
m
are zero, there is a subscript k
0
, with
k
0
> 0,
which in addition can be chosen in such a way that

k
0
k
0
for all k 1, . . . , m with
k
> 0. Thus we have for all k 1, . . . , m
k

k
k
0
k
0
0,
and
m
k=1
_
k

k
k
0
k
0
_
= 1.
By multiplying Equation (3.1) by
k
0
k
0
and adding x =
m
k=1
k
x
k
, we obtain
x =
m
k=1
_
k

k
k
0
k
0
_
x
k
.
As (
k
0
k
0
k
0
k
0
) = 0, the point x was represented as a convex combination of m1
elements in A.
3.2 Convex Functions
Denition 3.2.1. Let K be a convex subset of X and f : K (, ] a function.
f is called convex, if for all x, y K and for all [0, 1]
f(x + (1 )y) f(x) + (1 )f(y).
A function g : K [, ) is called concave, if g is convex.
In the sequel we will consider such functions f, which are not identical to innity
(proper convex functions).
Denition 3.2.2. Let K be a set and f : K (, ] a function. By the epigraph
of f we understand the set Epi(f) := (x, r) K R[ f(x) r.
Thus the epigraph contains all points of KR, which we nd within the domain
of nite values of f above the graph of f.
Using this notion one obtains the following characterization of convex functions,
based on the ideas of J. L. W. V. Jensen.
Section 3.2 Convex Functions 77
Theorem 3.2.3. Let K be a convex subset of X and f : K (, ] a function.
Then the following statements are equivalent:
(a) f is convex
(b) Epi(f) is a convex subset of X R
(c) f satises Jensens inequality, i.e. for all x
1
, . . . , x
n
K and for all positive
1
, . . . ,
n
R with
1
+ +
n
= 1 the inequality
f
_
n
k=1
k
x
k
_
k=1
k
f(x
k
)
holds.
Proof. (a) (b): Let (x, r), (y, s) Epi(f), and let [0, 1]. Since f is convex,
we have
f(x + (1 )y) f(x) + (1 )f(y) r + (1 )s,
i.e.
(x, r) + (1 )(y, s) = (x + (1 )y, r + (1 )s) Epi(f).
(b) (c): Let n N, and let x
1
, . . . , x
n
K, furthermore
1
, . . . ,
n
R
0
satisfying
1
+ +
n
= 1.
If for a k 1, . . . , n the value of the function f(x
k
) is innite, then the inequality
is trivially satised.
Let now w.l.o.g. f(x
1
), . . . , f(x
n
) R. The points (x
1
, f(x
1
)), . . . , (x
n
, f(x
n
))
are then in Epi(f). Since Epi(f) is convex by assumption, we have
_
n
k=1
k
x
k
,
n
k=1
k
f(x
k
)
_
=
n
k=1
k
(x
k
, f(x
k
)) Epi(f),
hence
f
_
n
k=1
k
x
k
_
k=1
k
f(x
k
).
(c) (a): This implication is obvious.
Convexity preserving Operations
The following theorem shows, how new convex functions can be constructed from
given convex functions. These constructions will play an important role in substanti-
ating the convexity of a function. In this context one tries to represent a complicated
function as a composition of simpler functions, whose convexity is easily estab-
lished.
Theorem 3.2.4. Let K be a convex subset of X.
(1) Let n N and let f
1
, . . . , f
n
: K R be convex functions. Then for all
1
, . . . ,
n
R
0

1
f
1
+ +
n
f
n
is also a convex function.
(2) All afne functions, i.e. sums of linear and constant functions, are convex.
(3) Let (X, | |) be a normed space. Then the norm | | : X R is a convex
function.
(4) Let f : K R be a convex function, C a convex superset of f(K) and g :
C R a convex and monotone increasing function. Then the composition
g f : K R is also a convex function.
(5) Let Y be another vector space, : X Y an afne mapping and f : Y Ra
convex function. Then the composition f : K R is also a convex function.
Proof. Is left to the reader.
Example 3.2.5. Let X be a normed space, then the following functions are convex:
(a) x |x|
2
,
(b) for x
0
, v
1
, . . . , v
n
X let f : R
n
R with
(a
1
, . . . , a
n
) f(a) :=
_
_
_
_
x
0

n
i=1
a
i
v
i
_
_
_
_
,
(c) more generally: let V be a vector space and L : V X afne and v f(v) :=
|L(v)|.
One obtains a further class of convex functions through the notion of sublinear
functionals:
Denition 3.2.6. Let X be a vector space and let f : X R a function.
(a) f is called positive homogeneous, if for all R
0
and for all x X we have
f(x) = f(x)
(b) f is called subadditive, if for all x, y X the inequality
f(x +y) f(x) +f(y)
holds.
(c) f is called sublinear, if f is positive homogeneous and subadditive.
(d) f is called a semi-norm, if f is sublinear and symmetric, i.e. for all x X we
have f(x) = f(x).
Section 3.2 Convex Functions 79
Minkowski Functional
In this section we want to become familiar with a method that relates convex sets to
sublinear functions. The method was introduced by Minkowski. This relation will
play a central role in our treatise, because the denition of an Orlicz space is based on
it.
Theorem 3.2.7. Let K be a convex subset of X, which contains 0 as an algebraically
interior point. Then the Minkowski functional q : X R, dened by
q(x) := inf > 0 [ x K
is a sublinear function on X. If K is symmetrical (i.e. x K implies x K), then
q is a semi-norm. If in addition K is linearly bounded w.r.t. 0, then q is a norm on X.
Proof. (1) q
K
is positive homogeneous, because for > 0 we have x K if and
only if x K.
(2) Since K is convex, we have for , > 0
K +K = ( +)
_

+
K +

+
K
_
( +)K,
hence for x, y X we have
q
K
(x) +q
K
(y) = inf > 0 [ x K + inf > 0 [ y K
= inf + [ x K and y K
inf + [ x +y ( +)K = q
K
(x +y).
(3) If K is symmetrical, then for > 0 we have: x K, if and only if x =
(x) K. But then q
K
(x) = q
K
(x).
(4) Let x ,= 0 and K linearly bounded w.r.t. 0. Then there is
0
> 0 such that for
all
0
we obtain: x / K, i.e. q
K
(x)
1
0
> 0.
As an illustration of the above theorem we want to introduce the l
p
-norms in R
n
(p > 1). The triangle inequality for this norm is the well-known Minkowski inequal-
ity, which in this context is obtained as a straightforward conclusion.
The function f : R
n
R with
x f(x) :=
n
i=1
[x
i
[
p
is convex. Thus the level set
K := x[ f(x) 1
is a convex subset of the R
n
, which contains 0 as an algebraically interior point and is
obviously linearly bounded w.r.t. 0.
For c
0
:=
p
_
n
i=1
[x
i
[
p
we have f(
x
c
0
) = 1 and for the Minkowski functional q
K
of K we thus obtain
q
K
(x) = inf
_
c > 0
f
_
x
c
_
1
_
=
p
_
n
i=1
[x
i
[
p
.
According to the above theorem the function x |x|
p
:=
p
_
n
i=1
[x
i
[
p
is a norm.
3.3 Difference Quotient and Directional Derivative
Theorem 3.3.1 (Monotonicity of the difference quotient). Let f be convex, let z X
x
0
Dom(f). Then the mapping u : R
>0
(, ] dened by
t u(t) :=
f(x
0
+tz) f(x
0
)
t
is monotonically increasing.
Proof. Let h : R
0
(, ] dene by
t h(t) := f(x
0
+tz) f(x
0
).
Apparently h is convex and we have for 0 < s t
h(s) = h
_
s
t
t +
t s
t
0
_
s
t
h(t) +
t s
t
h(0) =
s
t
h(t).
Thus we obtain
f(x
0
+sz) f(x
0
)
s

f(x
0
+tz) f(x
0
)
t
.
In particular the difference quotient
f(x
0
+tz)f(x
0
)
t
is monotonically decreasing for
t 0.
Corollary 3.3.2. The right-sided directional derivative
f
t
+
(x
0
, z) := lim
t0
f(x
0
+tz) f(x
0
)
t
and left-sided directional derivative
f
t
(x
0
, z) := lim
t0
f(x
0
+tz) f(x
0
)
t
exist in R. The latter is due to
f
t
(x
0
, z) = f
t
+
(x
0
, z). (3.2)
Section 3.3 Difference Quotient and Directional Derivative 81
3.3.1 Geometry of the Right-sided Directional Derivative
The monotonicity of the difference quotient, we have just proved, yields an inequality
for t = 1 that can be interpreted as follows: the growth of a convex function is at least
as large as the growth of the directional derivative, since we have
f
t
+
(x, y x) f(y) f(x), (3.3)
where x Dom(f) and y X arbitrary.
This inequality will serve as a basis for the introduction of the notion of the subgra-
dient in the description of a supporting hyperplanes of the epigraph of f (subgradient
inequality).
Sublinearity of the Directional Derivative
A particularly elegant statement about the geometry of the directional derivative is
obtained, if it is formed at an algebraically interior point of Dom(f).
Theorem 3.3.3. Let X be a vector space, f : X R a convex function, x
0
an
algebraically interior point of Dom(f). Then for all directions z X right-sided and
left-sided derivatives are nite and we obtain:
(a) the mapping f
t
+
(x
0
, ) : X R is sublinear
(b) the mapping f
t
(x
0
, ) : X R is superlinear
(c) for all z X we have: f
t
(x
0
, z) f
t
+
(x
0
, z).
Proof. At rst we have to show that for all z X the one-sided derivatives f
t
+
(x
0
, z)
and f
t
(x
0
, z) are nite. Let z X. Since x
0
is an algebraically interior point of
Dom(f), there is a > 0 with [x
0
z, x
0
+ z] Dom(f). As the difference
quotient is monotonically increasing, we have
f
t
+
(x
0
, z)
f(x
0
+z) f(x
0
)
< .
Since f is convex, we obtain for all t (0, 1]
f(x
0
) = f
_
1
1 +t
(x
0
+tz) +
t
1 +t
(x
0
z)
_
1
1 +t
f(x
0
+tz) +
t
1 +t
f(x
0
z).
hence
f(x
0
) +tf(x
0
) = (1 +t)f(x
0
) f(x
0
+tz) +tf(x
0
z),
and thus
tf(x
0
) tf(x
0
z) f(x
0
+tz) f(x
0
).
We obtain
<
f(x
0
) tf(x
0
z)

f(x
0
+tz) f(x
0
)
t
f
t
+
(x
0
, z),
which means f
t
+
(x
0
, z) R.
(a) We have to show the positive homogeneity and the subadditivity of f
t
+
(x
0
, ).
Concerning the positive homogeneity we observe: let z X and 0. If = 0,
then f
t
+
(x
0
, 0 z) = 0 = 0 f
t
+
(x
0
, z). If > 0, then
f
t
+
(x
0
, z) = lim
0
f(x
0
+z) f(x
0
)
= lim
0
f(x
0
+z) f(x
0
)
= f
t
+
(x
0
, z).
We now turn our attention to the subadditivity: let z
1
, z
2
X, then, because of the
convexity of f
f
t
+
(x
0
, z
1
+z
2
) = lim
0
f(x
0
+(z
1
+z
2
)) f(x
0
)
= lim
0
1
_
f
_
1
2
(x
0
+ 2z
1
) +
1
2
(x
0
+ 2z
2
)
_
f(x
0
)
_
lim
0
1
_
1
2
f(x
0
+ 2z
1
) +
1
2
f(x
0
+ 2z
2
) f(x
0
)
_
= lim
0
f(x
0
+ 2z
1
) f(x
0
)
2
+ lim
0
f(x
0
+ 2z
2
) f(x
0
)
2
= f
t
+
(x
0
, z
1
) +f
t
+
(x
0
, z
2
).
(b) Follows from (a) due to Equation (3.2).
(c) Let z X, then
0 = f
t
+
(x
0
, z z) f
t
+
(x
0
, z) +f
t
+
(x
0
, z),
hence
f
t
(x
0
, z) = f
t
+
(x
0
, z) f
t
+
(x
0
, z).
Denition 3.3.4. Let X be a vector space, U a subset of X, Y a normed space, F :
U Y a mapping, x
0
U and z X. Then F is called differentiable (or Gteaux
differentiable) at x
0
in direction z, if there is a > 0 with [x
0
z, x
0
+z] U and
the limit
F
t
(x
0
, z) := lim
t0
F(x
0
+tz) F(x
0
)
t
Section 3.3 Difference Quotient and Directional Derivative 83
in Y exists. F
t
(x
0
, z) is called the Gteaux derivative of F at x
0
in direction z.
F is called Gteaux differentiable at x
0
, if F is differentiable at x
0
in every direction
z X. The mapping F
t
(x
0
, ) : X Y is called the Gteaux derivative of F at x
0
.
Linearity of the Gteaux Derivative of a Convex Function
Theorem 3.3.5. Let X be a vector space, f : X R a convex function, x
0
an
algebraically interior point of Dom(f). Then we have: f is Gteaux differentiable at
x
0
, if and only if the right-sided derivative f
t
+
(x
0
, ) : X R is linear. In particular
the right-sided derivative is then equal to the Gteaux derivative.
Proof. Let f be Gteaux differentiable, then one obtains the homogeneity for < 0
from
f
t
(x
0
, z) = f
t
(x
0
, z) = f
t
+
(x
0
, z) = ()(f
t
+
(x
0
, z)) = f
t
(x
0
, z).
Conversely, if the right-sided derivative is linear, then using Equation (3.2):
f
t
(x
0
, h) = f
t
+
(x, h) = f
t
(x, h).
Theorem 3.3.6. Let X be a vector space, U an algebraically open convex subset
of X, and f : U R a function, which is Gteaux differentiable at each point
of U, then the following statements are equivalent:
(a) f is convex
(b) for all x U the derivative f
t
(x, ) : X R is linear, and for all x
0
, x U
the following inequality holds
f
t
(x
0
, x x
0
) f(x) f(x
0
).
Proof. (b) follows from (a) using Inequality (3.3) and Theorem 3.3.5.
Conversely, let x
1
, x
2
U and [0, 1]. Then x
0
:= x
1
+ (1 )x
2
U and
we have: (x
1
x
0
) + (1 )(x
2
x
0
) = 0 and hence
0 = f
t
(x
0
, (x
1
x
0
) + (1 )(x
2
x
0
))
= f
t
(x
0
, x
1
x
0
) + (1 )f
t
(x
0
, x
2
x
0
)
(f(x
1
) f(x
0
)) + (1 )(f(x
2
) f(x
0
))
= f(x
1
) + (1 )f(x
2
) f(x
0
),
thus we obtain
f(x
0
) = f(x
1
+ (1 )x
2
) f(x
1
) + (1 )f(x
2
).
Monotonicity of the Gteaux Derivative of a Convex Function
Let X be a vector space and U X algebraically open and X
t
the vector space of all
linear functionals on X.
Denition 3.3.7. A mapping A : U X
t
is called monotone on U, if for all x
1
, x
2

U the subsequent inequality holds:
A(x
1
) A(x
2
), x
1
x
2
0.
Then we obtain the following characterization of convex functions:
Theorem 3.3.8. Let U be an algebraically open convex set and f : U R Gteaux
differentiable. f is convex, if and only if for all x U the Gteaux derivative f
t
(x, )
at x is linear and the Gteaux derivative is f
t
: U X
t
monotone.
Proof. Let f be convex. Due to the previous theorem we have for all x, y U
f
t
(x, ) f
t
(y, ), x y = f
t
(x, y x) +f
t
(y, x y)
f(y) f(x) +f(x) f(y) = 0,
i.e. f
t
is monotone on U.
Conversely, in order to establish convexity of f we again make use of the previous
theorem: let x, y U, then for h := y x and a > 0 the interval [x h, x +
h] U. Let now : (, ) R be dened by (t) := f(x + th), then: is
differentiable and
t
(t) = f
t
(x +th, h), since
t
(t) = lim
0
(t +) (t)
= lim
0
f(x +th +h) f(x +th)
= f
t
(x +th, h).
Due to the mean value theorem for there exists a (0, 1) such that f(y)
f(x) = f
t
(x + h, h). From the monotonicity of f
t
and the linearity of f
t
(x, ) and
f
t
(x +h, ) we obtain
(f(y) f(x)) = f
t
(x +h, h) f
t
(x, h) +f
t
(x, h)
= f
t
(x +h, ) f
t
(x, ), x +h x +f
t
(x, h)
f
t
(x, h) = f
t
(x, h),
i.e. f(y) f(x) f
t
(x, y x). The assertion then follows by the previous theorem.
Section 3.4 Necessary and Sufcient Optimality Conditions 85
3.4 Necessary and Sufcient Optimality Conditions
3.4.1 Necessary Optimality Conditions
For the minimal solution of a Gteaux differentiable function we can state the fol-
lowing necessary condition, which is a complete analogy to the well-known condition
from real analysis:
Theorem 3.4.1. Let U be a subset of a vector spaces X and let f : U R have a
minimal solution at x
0
U.
If for a z X and a > 0 the interval (x
0
z, x
0
+z) is contained in U and if
f is differentiable at x
0
in direction z, then: f
t
(x
0
, z) = 0.
Proof. The function g : (, ) R with t g(t) := f(x
0
+ tz) has a minimal
solution at 0. Thus 0 = g
t
(0) = f
t
(x
0
, z).
Corollary 3.4.2. Let V be a subspace of the vector space X, y
0
X and f : y
0
+
V R be a Gteaux differentiable function. If x
0
M(f, y
0
+ V ), then for all
z V : f
t
(x
0
, z) = 0.
3.4.2 Sufcient Condition: Characterization Theorem of Convex
Optimization
Theorem 3.4.3. Let K be a convex subset of the vector space X and f : X R a
convex function. f has a minimal solution x
0
K if and only if for all x K
f
t
+
(x
0
, x x
0
) 0. (3.4)
Proof. Let x
0
be a minimal solution of f on K. For x K and t (0, 1] we have
x
0
+t(x x
0
) = tx + (1 t)x
0
K and hence
f(x
0
+t(x x
0
)) f(x
0
)
t
0.
The limit for t to zero yields the above Inequality (3.4). Conversely from Inequali-
ties (3.3) and (3.4) it follows for all x K
f(x) f(x
0
) f
t
+
(x
0
, x x
0
) 0,
i.e. x
0
M(f, K).
Corollary 3.4.4. Let V be a subspace of the vector space X, y
0
X and f : y
0
+
V R a Gteaux differentiable convex function. Then the following equivalence
holds:
x
0
M(f, y
0
+V ), if and only if for all z V : f
t
(x
0
, z) = 0.
Proof. According to Corollary 3.4.2 the condition is necessary and due to the charac-
terization theorem sufcient.
3.5 Continuity of Convex Functions
We will see below that from the boundedness of a convex function on an open set its
continuity already follows:
Denition 3.5.1. Let X be a metric space and f : X R. f is called lower semi-
continuous (resp. upper semi-continuous), if for every r R the level set S
f
(r) =
x X[ f(x) r (resp. x X[ f(x) r) is closed.
Remark 3.5.2. The supremum of lower semi-continuous functions is lower semi-
continuous.
Proof. Let M be a set of lower semi-continuous functions on X and g(x) :=
supf(x) [ f M. From S
g
(r) =
fM
S
f
(r) closedness of S
g
(r) follows.
Theorem 3.5.3. Let X be a metric space and f : X R. The following statements
are equivalent:
(a) f is lower semi-continuous (resp. upper semi-continuous)
(b) for every x
0
X and every > 0 there exists a > 0, such that for all x in the
ball K(x
0
, ) we have
f(x) f(x
0
) (resp. f(x) f(x
0
) +)
(c) for every x X and every sequence (x
k
)
kN
in X convergent to x we have
lim
k
f(x
k
) f(x) (resp. lim
k
f(x
k
) f(x)),
(d) the epigraph of f is closed in X R.
Theorem 3.5.4. Let X be a normed space, U an open and convex subset of X and
let f : U R be a convex function. Then the following statements are equivalent:
(a) f is continuous on U
(b) f is upper semi-continuous on U
(c) f is bounded from above on an open subset U
0
of U
(d) f is continuous at a point in U.
The proof is given in the chapter on stability within the framework of families of
convex functions (see Theorem 5.3.8).
There we also demonstrate that in Banach spaces continuity, upper semi-continuity
and lower semi-continuity coincide (see Theorem 5.3.12).
Section 3.6 Frchet Differentiability 87
Theorem 3.5.5. Let U be an open and convex subset of a normed space X and let
f : U R convex. Then the following statements are equivalent:
(a) Epi(f) has an interior point
(b) f is continuous
(c) Graph(f) is closed in U R
(d) every point in Epi(f) Graph(f) is an interior point of Epi(f).
Proof. (a) (b): Let (x
0
, r) be an interior point of Epi(f). Then for a neighborhood
V of x
0
V r Epi(f),
i.e. f(x) r for all x V . Due to Theorem 3.5.4 f is continuous.
(b) (c): we show that the complement of Graph(f) in U R is open: let
f(x
0
) ,= r and let I(f(x
0
)) and I(r) be disjoint open intervals. Then
f
1
(I(f(x
0
)) I(r)) Graph(f) = .
(c) (d): let f(x
0
) < r and K = K((x
0
, r), ) a ball in X R contained in
U R, such that k Graph(f) = . For (y, s) K we have f(y) < s.
(d) (a) holds trivially.
3.6 Frchet Differentiability
In the sequel the following notion will play an important role, in particular in the
context of function spaces:
Denition 3.6.1. Let X, Y be normed spaces, U an open subset of X, F : U Y
a mapping, x
0
U. Then F is called Frchet differentiable at x
0
, if there is a linear
and continuous mapping A : X Y such that
lim
|h|0
F(x
0
+h) F(x
0
) A(h)
|h|
= 0.
A is called the Frchet derivative of F at x
0
and denoted by F
t
(x
0
). F is called
Frchet differentiable on U, if F is Frchet differentiable at every point of U. The
mapping F
t
: X L(X, Y ) is called the Frchet derivative of F at x
0
.
Remark 3.6.2. Let X, Y be normed spaces, U an open subset of X, F : U Y
Frchet differentiable at x U then
(a) the Frchet derivative is uniquely determined
(b) F is Gteaux differentiable and for all h X we have
F(x, h) = F
t
(x)(h).
3.7 Convex Functions in R
n
The results of this section will again demonstrate the exceptional properties of con-
vex functions: in nite-dimensional spaces nite convex functions on open sets are
already continuous, and partial, Gteaux, Frchet and continuous differentiability co-
incide.
Theorem 3.7.1. Every real-valued convex function f on an open subset of R
n
is
continuous.
Proof. See Theorem 5.3.11.
We will turn our attention to the differentiability of convex functions in R
n
. It will
turn out that partial, total and Gteaux differentiability coincide. In order to show this
equivalence we need the following
Lemma 3.7.2. Let g be a nite sublinear functional on a nite-dimensional vec-
tor space V and let v
1
, . . . , v
n
be a basis of V , for which: g(v
i
) = g(v
i
),
i 1, . . . , n. Then g is linear on V .
Proof. Apparently then also g((v
i
)) = g(v
i
) for i 1, . . . , n and R
arbitrary holds.
Let now x V , then the sublinearity of g implies
0 = g(0) = g
_
1
2
(x x)
_
1
2
g(x) +
1
2
g(x),
i.e. g(x) g(x). Let furthermore x have the representation x =
n
i=1
i
v
i
, then
the sublinearity implies
n
i=1
g(
i
v
i
) g(x) g(x)
n
i=1
g(
i
v
i
) =
n
i=1
g(
i
v
i
).
Therefore
g
_
n
i=1
i
v
i
_
=
n
i=1
i
g(v
i
).
From the existence of the partial derivatives we obtain for convex functions already
the Gteaux differentiability, which as we already know implies the linearity of
the Gteaux differential.
Section 3.7 Convex Functions in R
n
89
Theorem 3.7.3. Let U be an open subset of R
n
and f : U R convex. If f is
partially differentiable at x U, then f is Gteaux differentiable at x and we have
f
t
(x, h) =
n
i=1
h
i
f
t
(x, e
i
) = f(x), h,
if e
1
, . . . , e
n
denote the unit-vectors in R
n
.
Proof. For i = 1, . . . , n we have
f
t
+
(x, e
i
) = f
x
i
(x) = f
t
(x, e
i
) = f
t
+
(x, e
i
).
According to the above lemma f
t
+
(x, ) is linear and thus by Theorem 3.3.5 f Gteaux
differentiable at x. The linearity yields in particular
f
t
(x, h) = f
t
_
x,
n
i=1
h
i
e
i
_
=
n
i=1
h
i
f
t
(x, e
i
) = f(x), h.
Theorem 3.7.4. Let f be convex on an open subset U of R
n
and partially differen-
tiable at x U. Then f is Frchet differentiable at x and the Frchet derivative is the
gradient.
Proof. For the linear mapping, that we have to identify, we take the multiplication
with the 1-row matrix A of the partial derivatives. Let x U. Since U is open, we
can nd a ball K(0, ) with center 0 and radius , such that x+nK(0, ) is contained
in U. We now consider the convex function g : K(0, ) R, dened by
z g(z) = f(x +z) f(x) A z. (3.5)
The vector z = (z
1
, . . . , z
n
) can be rewritten as the sum z =
n
i=1
z
i
e
i
. The
convexity of g yields
g(z) = g
_
n
i=1
n
n
z
i
e
i
_
1
n
n
i=1
g(nz
i
e
i
), (3.6)
and the denition of g
g(nz
i
e
i
) = f(x +z
i
ne
i
) f(x)
f
x
i
(x)nz
i
. (3.7)
For z
i
= 0 we have g(nz
i
e
i
) = 0 and for z
i
,= 0 (3.7) implies
lim
z
i
0
g(nz
i
e
i
)
nz
i
= 0. (3.8)
Using (3.6) and the CauchySchwarz inequality we obtain after omitting the terms
with z
i
= 0
g(z)
z
i
g(nz
i
e
i
)
z
i
n
|z|
_
g(nz
i
e
i
)
z
i
n
_
2
|z|
g(nz
i
e
i
)
z
i
n
. (3.9)
In a similar manner we obtain
g(z) |z|
g(nz
i
e
i
)
z
i
n
. (3.10)
Due to g(0) = 0 we have
0 = g
_
z + (z)
2
_
1
2
g(z) +
1
2
g(z),
therefore
g(z) g(z). (3.11)
Using (3.9), (3.10), and (3.11) we have
|z|
g(z
i
ne
i
)
z
i
n
g(z) g(z) |z|
g(z
i
ne
i
)
z
i
n
, (3.12)
(3.8) then implies
lim
|z|0
g(z)
|z|
= 0.
Theorem 3.7.5. Let U be an open subset of R
n
and let f : U R be a convex
function differentiable in U. Then f
t
: U R
n
is continuous.
Proof. Let w.l.o.g. 0 U and let g : U R be dened by
x g(x) := f(x) f(0) f
t
(0), x,
then g(0) = 0 and g
t
(0) = 0. The subgradient inequality yields g(x) = g(x)g(0)
g
t
(0), x 0 = 0. Apparently it is sufcient to show the continuity of g
t
at 0.
Let Q be the unit ball w.r.t. the maximum norm on R
n
and let z
i
, i = 1, . . . , m,
the extreme points of Q. Since U is open, there is a k > 0 with kQ U. Let now
0 < h < k and let x be chosen such that |x|

1
2
h.
Let further y R
n
0 be arbitrary, then because of the continuity of the norm
there is > 0 with |x + y|
= h. Let z := x + y, then we obtain using the

subgradient inequality
g(z) g(x) g
t
(x), z x = g
t
(x), y. (3.13)
Section 3.8 Continuity of the Derivative 91
The theorem of Minkowski 3.9.12 provides a representation of z as a convex combi-
nation of the extreme points of hQ, i.e. z =
m
i=1
i
hz
i
. Therefore
g(z)
m
i=1
i
g(hz
i
) max
i
g(hz
i
)
m
i=1
i
= max
i
g(hz
i
).
From g(0) = 0 and the convexity of g it follows that g(hz
i
) = g(k
h
k
z
i
)
h
k
g(kz
i
)
and hence g(z)
h
k
max
i
g(kz
i
).
Furthermore: h = |z|
|x|
+ |y|

1
2
h + |y|
and hence h
2|y|
.
Using (3.13) we obtain
g
t
(x), y
1
(g(z) g(x)
..
0
)
1
2|y|
k
max
i
g(kz
i
) = 2|y|
max
i
g(kz
i
)
k
.
In the same way we obtain
g
t
(x), y = g
t
(x), y 2|y|
max
i
g(kz
i
)
k
,
in total therefore
[g
t
(x), y[ 2|y|
max
i
g(kz
i
)
k
.
But by denition: lim
k0
g(kz
i
)
k
= g
t
(0), z
i
= 0. Let now > 0 be given, choose
k, such that 2 max
i
g(kz
i
)
k
< , then we obtain
[g
t
(x), y[ |y|
,
and thus the continuity of g
t
at 0.
3.8 Continuity of the Derivative
Denition 3.8.1. Let X a normed space and U an open subset of X. A mapping
G : U X
is called hemi-continuous, if for arbitrary x U and u X and every

real sequence t
n
t
0
with x +t
n
u U for n = 0, 1, 2, . . . we have
G(x +t
n
u) G(x +t
0
u).
Lemma 3.8.2. Let U be an open subset of X and let f : U R convex and Gteaux
differentiable, then the Gteaux derivative of f is hemi-continuous.
Proof. Let x U and u, w X be arbitrary. Let W R
2
be a suitable open
neighborhood of the origin, such that x + tu + sw U for all t, s W. Then
g : W R with g(t, s) := f(x + tu + sw) is convex and differentiable and for
the partial derivatives (which at the same time are the directional derivatives of f at
x +tu +sw in direction u resp. w) we have
t
g(t, s) = f
t
(x +tu +sw), u
s
g(t, s) = f
t
(x +tu +sw), w.
In particular:

s
g(t, 0) = f
t
(x +tu), w.
According to Theorems 3.7.4 and 3.7.5 the partial derivatives of g are continuous,
hence in particular for t
n
t
0
we obtain
s
g(t
n
, 0) = f
t
(x +t
n
u), w

s
g(t
0
, 0) = f
t
(x +t
0
u), w,
and thus the assertion.
Theorem 3.8.3. Let U be an open subset of the Banach space X and let f : U R
be continuous, convex, and Gteaux differentiable. Let x
0
U be arbitrary, then
there is a > 0 such that the Gteaux derivative of f is norm-bounded on the ball
K(x
0
, ) i.e. there is a constant K with |f
t
(x)| K for all x K(x
0
, ).
Proof. Since f is continuous, there is > 0, such that [f[ is bounded on the ball
K(x
0
, 2). Let x K(x
0
, ) and z K(0, ) arbitrary, then apparently x + z
K(x
0
, 2). Using the subgradient inequality we obtain for a suitable constant M
M f(x +z) f(x) f
t
(x), z
M f(x z) f(x) f
t
(x), z,
therefore [f
t
(x), z[ M for all z K(0, ) and all x K(x
0
, ). Let now y U
arbitrary, then there is a z K(0, ) and a 0 with y = z, hence
M [f
t
(x), y[
for all x K(x
0
, ). Thus the family of linear continuous functionals f
t
(x) [ x
K(x
0
, ) is pointwise bounded and hence according to the theorem of Banach on
Uniform Boundedness (see 5.3.14) norm-bounded, i.e. there is a constant K with
|f
t
(x)| K for all x K(x
0
, ).
Denition 3.8.4. Let X be a normed space and U an open subset of X. A mapping
G : U X
is called demi-continuous, if for arbitrary sequences x

n
x
0
U we
have
G(x
n
) G(x
0
).
Section 3.8 Continuity of the Derivative 93
The line of reasoning in the proof of the subsequent theorem aside from the sub-
stantiation of local boundedness follows from that in Vainberg [109] for monotone
operators.
Theorem 3.8.5. Let U be an open subset of a Banach space X and let f : U R
be convex, continuous, and Gteaux differentiable, then the Gteaux derivative of f
is demi-continuous.
Proof. Let (x
n
) be a sequence with x
n
x
0
and let u U arbitrary. According
to Theorem 3.8.3 the sequence |f
t
(x
n
)| is bounded. Let t
n
:= |x
n
x
0
|
1
2
and
u
n
:= x
0
+ t
n
u. Due to the hemi-continuity of f
t
by Lemma 3.8.2 it follows that
f
t
(u
n
) f
t
(x
0
). In particular |f
t
(u
n
)| is bounded. The monotonicity of f
t
yields
t
1
n
f
t
(u
n
) f
t
(x
n
), u
n
x
n
0 and hence
f
t
(x
n
), u t
1
n
f
t
(u
n
), u
n
x
n
+t
1
n
f
t
(x
n
), t
n
u (u
n
x
n
). (3.14)
We now investigate the rst term on the right-hand side
t
1
n
f
t
(u
n
), u
n
x
n
=
_
f
t
(u
n
), t
n
x
0
x
n
|x
0
x
n
|
_
. .
0
+f
t
(u
n
), u
. .
f
(x
0
),u)
.
For the 2nd term of the right-hand side of (3.14) we obtain
t
1
n
f
t
(x
n
), t
n
u (u
n
x
n
) =
_
f
t
(x
n
),
x
n
x
0
|x
n
x
0
|
1
2
_
|f
t
(x
n
)|t
n
.
Thus there is a sequence tending to zero (c
n
) with
f
t
(x
n
) f
t
(x
0
), u c
n
.
In the same way we obtain another sequence (d
n
) tending to zero with
f
t
(x
n
) f
t
(x
0
), u d
n
,
in total
d
n
f
t
(x
n
) f
t
(x
0
), u c
n
,
and thus the assertion, since u was arbitrary.
The subsequent theorem can be found in Phelps [91] and is a generalization of a
corollary of the lemma of Shmulian 8.4.20 (see [42], p. 145) for norms.
Theorem 3.8.6. Let U be an open subset of a Banach space X and let f : U R
be convex, continuous, and Gteaux differentiable. Then f is Frchet differentiable, if
and only if the Gteaux derivative of f is continuous.
Proof. Let f Frchet differentiable and x
0
U. We have to show: for every > 0
there is a > 0, such that for all x K(x
0
, ) we have f
t
(x) K(f
t
(x
0
), ).
Suppose this is not the case, then there is a > 0, and a sequence (x
n
) U, such
that |x
n
x
0
| 0 but |f
t
(x
n
) f
t
(x
0
)| > 2. Consequently there is a sequence
(z
n
) in X with |z
n
| = 1 and the property f
t
(x
n
) f
t
(x
0
), z
n
> 2. Due to the
Frchet differentiability of f at x
0
there is > 0 with
f(x
0
+y) f(x
0
) f
t
(x
0
), y |y|,
provided that |y| . The subgradient inequality implies
f
t
(x
n
), (x
0
+y) x
n
f(x
0
+y) f(x
n
),
and hence
f
t
(x
n
), y f(x
0
+y) f(x
0
) +f
t
(x
n
), x
n
x
0
+f(x
0
) f(x
n
).
Let now y
n
= z
n
, i.e. |y
n
| = , then we obtain
2 < f
t
(x
n
) f
t
(x
0
), y
n
f(x
0
+y
n
) f(x
0
) f
t
(x
0
), y
n
+f
t
(x
n
), x
n
x
0
+f(x
0
) f(x
n
)
+f
t
(x
n
), x
n
x
0
+f(x
0
) f(x
n
),
because of the local boundedness of f
t
we obtain
[f
t
(x
n
), x
n
x
0
[ |f
t
(x
n
)||x
n
x
0
| 0,
and because of the continuity of f we have f(x
0
) f(x
n
) 0, hence 2 < ,
a contradiction.
Conversely let the Gteaux derivative be continuous. By the subgradient inequality
we have for x, y U: f
t
(x), yx f(y)f(x) and f
t
(y), xy f(x)f(y).
Thus we obtain
0 f(y)f(x)f
t
(x), yx f
t
(y)f
t
(x), yx |f
t
(y)f
t
(x)||yx|.
Putting y = x +z we obtain
0
f(x +z) f(x) f
t
(x), z
|z|
|f
t
(x +z) f
t
(x)|,
and by the continuity of f
t
the assertion.
Section 3.9 Separation Theorems 95
3.9 Separation Theorems
The basis for our further investigations in this section is the following theorem on the
extension of a linear functional (see e.g. [59])
Theorem3.9.1 (Theoremof HahnBanach). Let X be a vector space and f : X R
a convex function. Let V be a subspace of X and : V R a linear functional with
(x) f(x) for all x V . Then there is a linear functional u : X R with u
[V
=
and u(x) f(x) for all x X.
Remark 3.9.2. In the theorem of HahnBanach it sufces, instead of the niteness
of f to only require that 0 is an algebraically interior point of Domf.
Corollary 3.9.3. Let X be a normed space and f : X R a continuous convex
function. Let V be a subspace of X and : V R a linear functional with (x)
f(x) for all x V . Then there is a continuous linear functional u : X R with
u
[V
= and u(x) f(x) for all x X.
A geometrical version of the theorem of HahnBanach is provided by the theorem
of Mazur.
Denition 3.9.4. Let X be a vector space. A subset H of X is called hyperplane
in X, if there is a linear functional u : X R different from the zero functional
and an R such that H = x X[ u(x) = . H is called a zero hyperplane,
if 0 H. A subset R of X is called a half-space, if there is a linear functional
u : X R and an R such that R = x X[ u(x) .
Remark 3.9.5. Let K be a convex subset of a normed space with 0 Int(K). For
the corresponding Minkowski functional q : X R we have: q(x) < 1 holds, if and
only if x Int(K).
Proof. Let q(x) < 1, then there is a [0, 1) with x K. Putting := 1 then
K is a neighborhood of the origin, and we have: x + K K + K K, i.e.
x Int K.
Conversely, let x Int K, then there is a > 1 with x K. Hence x
1
K and
q(x)
1
< 1.
Theorem 3.9.6 (Mazur). Let X be a normed space, K a convex subset of X with
non-empty interior and V a subspace of X with V Int(K) = . Then there is a
closed zero hyperplane H in X with V H and H Int(K) = .
Proof. Let x
0
Int(K) and q : X K the Minkowski functional of K x
0
.
According to the above remark q(xx
0
) < 1 if and only if x Int(K). Put f : X
R with x f(x) := q(x x
0
) 1, then f is convex and f(x) < 0 if and only if
x Int(K). Due to V Int(K) = we have f(x) 0 for all x V . Let be the zero
functional on V . According to the corollary of the theorem of HahnBanach there is
a continuous linear functional u : X R with u
[V
= and u(x) f(x) for all
x X. Let H = x X[ u(x) = 0 the closed zero hyperplane dened by u. Then
V H, and for all x Int(K) we have u(x) f(x) < 0, i.e. H Int(K) = .
In the sequel we need a shifted version of the theorem of Mazur:
Corollary 3.9.7. Let X be a normed space, K a convex subset of X with non-empty
interior and 0 Int(K). Let V be a subspace of X, x
0
X with x
0
+V Int(K) = .
Then there is a closed hyperplane H in X with x
0
+V H and H Int(K) = .
The proof is (almost) identical.
An application of the theorem of Mazur is the theorem of Minkowski. In the proof
we need the notion of an extreme set of a convex set:
Denition 3.9.8. Let S be a closed convex subset of a normed space. Then a non-
empty subset M of S is called extreme set of S, if
(a) M is convex and closed
(b) each open interval in S, which contains a point of M, is entirely contained in M.
Example 3.9.9. For a cube in R
3
the vertices are extreme points (see Deni-
tion 3.1.10). The edges and faces are extreme sets.
Remark 3.9.10. Let M be an extreme set of a closed convex set S then
E
p
(M) E
p
(S).
To see this, let x be an extreme point of M, suppose x is not extreme point of S, then
there is an open interval in S, which contains x. However, since M is an extreme set,
this interval is already contained in M, a contradiction.
As a preparation we need the following
Lemma 3.9.11. Let X be a normed space, S ,= a compact convex subset of X, let
f X
and let := supf(x) [ x S. Then the set f

1
() S is an extreme set of
S, i.e. the hyperplane H := x X[ f(x) = has a non-empty intersection with
S, and H S is an extreme set of S.
Proof. By the theorem of Weierstra the set f
1
()S is due to the compactness of
S non-empty, apparently also compact and convex. Let (x
1
, x
2
) be an open interval
in S, which contains a point x
0
f
1
() S, i.e. x
0
= x
1
+ (1 )x
2
for a
(0, 1). Since f(x
0
) = = f(x
1
) + (1 )f(x
2
), it follows by the denition
of that f(x
1
) = = f(x
2
), i.e. (x
1
, x
2
) f
1
() S.
Section 3.9 Separation Theorems 97
Theorem 3.9.12 (Theorem of Minkowski). Let X be an n-dimensional space and S
a convex compact subset of X. Then every boundary point (resp. arbitrary point) of S
can be represented as a convex combination of at most n (resp. n + 1) extreme points
of S.
Proof. We perform induction over the dimension of S, where by the dimension of
S we understand the dimension of the afne hull
A[ S A X and A afne
of S.
If dimS = 0, then S consists of at most one point and therefore S = E
p
(S).
Assume, the assertion holds for dimS m 1. Let now dimS = m. W.l.g. let
0 S. Let X
m
:= spanS, then the interior Int(S) ,= w.r.t. X
m
(see Lemma 3.1.9)
and convex.
a) Let x
0
be a boundary point of S w.r.t. X
m
. Due to the corollary of the separation
theorem of Mazur there exists a closed hyperplane H in X
m
with HInt(S) = and
x
0
H, i.e. H is supporting hyperplane of S in x
0
. The set H S is compact and
by the previous lemma an extreme set of S with dimension m 1. According to
the induction assumption x
0
can be represented as a convex combination of at most
(m 1) + 1 = m extreme points from H S. Since by Remark 3.9.10 we have
E
p
(H S) E
p
(S), the rst part of the assertion follows.
b) Let now x
0
Int(S) (w.r.t. X
m
) arbitrary and z E
p
(S) (exists due to a))
be arbitrarily chosen. The set S z, x
0
with z, x
0
:= x X[ x = z + (1
)x
0
, R is because of the boundedness of S an interval [z, y], whose endpoints
are boundary points of S, and which contains x
0
as an interior point. Since y can,
due to a), be represented as a convex combination of at most m extreme points and
z E
p
(S), the assertion follows.
The separation of convex sets is given a precise meaning by the following deni-
tion:
Denition 3.9.13. Let X be a vector space, A, B subsets of X and H = x X[
u(x) = a hyperplane in X. H separates A and B, if
supu(x)[x A infu(x)[x B.
H strictly separates A and B, if H separates the sets A and B and if one of the two
inequalities is strict.
A consequence of the theorem of Mazur is the
Theorem 3.9.14 (Eidelheit). Let X be a normed vector space, let A, B disjoint, con-
vex subsets of X. Let further Int(A) ,= . Then there is a closed hyperplane H in X,
which separates A and B.
Proof. Let K := A B = a b [ a A, b B. Then A B is a convex subset
of X. Since Int(A) ,= we have Int(K) ,= . Since A, B are disjoint, it follows that
0 / K. With V := 0 the theorem of Mazur implies the existence of a closed zero
hyperplane, which separates 0 and K, i.e. there exists u X
, such that for all

x
1
A, x
2
B: u(x
1
x
2
) u(0) = 0 or u(x
1
) u(x
2
).
Remark 3.9.15. Instead of A B = it is enough to require Int(A) B = .
Proof. By Theorem 3.1.8 we have A Int(A). According to the theorem of Eidelheit
there is a closed hyperplane, which separates Int(A) and B.
In order to obtain a statement about strict separation of convex sets, we will prove
the subsequent lemma:
Lemma 3.9.16. Let X be a normed space, A a closed and B a compact subset of X.
Then A+B := a +b [ a A, b B is closed.
Proof. For n N let a
n
A, b
n
B with lim
n
(a
n
+ b
n
) = z. Since B is com-
pact, the sequence (b
n
) has a subsequence (b
n
i
), that converges to b B, and hence
lim
i
a
n
i
= z b A due to the closedness of A. Thus z = (z b) +b A+B.
Theorem 3.9.17. Let X be a normed vector space, let A, B be disjoint, convex sub-
sets of X. Furthermore let A be closed and B compact. Then there is a closed
hyperplane H in X, which strictly separates A and B.
Proof. Let B at rst consist of a single point, i.e. B = x
0
X. The complement
of A is open and contains x
0
. Therefore there is an open ball V with center 0, such
that x
0
+V is contained in the complement of A. According to the separation theorem
of Eidelheit there is a closed hyperplane in X that separates x
0
+ V and A, i.e. there
exists a u X
0 with sup u(x

0
+ V ) inf u(A). Since u ,= 0 there is v
0
V
with u(v
0
) > 0. Hence
u(x
0
) < u(x
0
+v
0
) sup u(x
0
+V ) inf u(A).
Let now B be an arbitrary compact convex set with A B = . By the previous
lemma, A B is closed. Since A, B are disjoint, we have 0 / A B. Due to the
rst part we can strictly separate 0 and A B, corresponding to the assertion of the
theorem.
An immediate application of the strict separation theorem is the
Theorem 3.9.18. Let K be a closed convex subset of a normed space X, then K is
weakly closed.
Section 3.10 Subgradients 99
Proof. Every closed half-space contained in X is a weakly closed set. If one forms the
intersection of all half-spaces containing K, we again obtain a weakly closed set M,
which apparently contains K. Suppose M contains a point x
0
, which does not belong
to K. Then due to the strict separation theorem there is a closed hyperplane,
which strictly separates x
0
and K. As one of the two corresponding closed half-
spaces contains K, we obtain a contradiction.
This is a theorem we will use in the sequel in various situations.
Another consequence of the separation theorem is the following result on extreme
points:
Theorem 3.9.19 (KreinMilman). Every closed convex subset S of a normed space
is the closed convex hull of its extreme points, i.e.
S = conv(E
p
(S)).
Proof. Let B := conv(E
p
(S)), then we have to show B = S. Apparently B S,
since the closure of a convex set is convex (see Theorem 3.1.8). Suppose there is
x
0
S such that x
0
/ B. Then according to the Strict Separation Theorem 3.9.17
there is a continuous linear functional f such that
f(x) < f(x
0
) for all x B.
Let now := supf(x) [ x S then f
1
() S contains an extreme point y
of S according to Remark 3.9.10 and Lemma 3.9.11, but this is a contradiction to
f(B) < f(x
0
) since = f(y) < f(x
0
) .
3.10 Subgradients
Denition 3.10.1. Let X be a normed space, f : X R. A u X
is called
subgradient of f at x
0
Domf, if for all x X the subgradient inequality
f(x) f(x
0
) u, x x
0
holds. The set f(x

0
) := u X
[ u is subgradient of f at x
0
is called subdiffer-
ential of f at x
0
.
The existence of subgradients is described by the following theorem:
Theorem 3.10.2. Let X be a normed space, f : X R convex. If f is
continuous at x
0
Domf, then f(x
0
) is non-empty.
Proof. Let g : X R with x f(x + x
0
) f(x
0
), then g is continuous
at 0. Put V = 0 and (0) = 0. Then there is by the theorem of HahnBanach an
extension u X
of with u, x g(x) = f(x + x

0
) f(x
0
) for all x X.
Choose x = x
1
x
0
, then u, x
1
x
0
f(x
1
) f(x
0
), i.e. u f(x
0
).
A direct consequence of the denition is the monotonicity of the subdifferential in
the following sense:
Theorem 3.10.3. Let X be a normed space, f : X R convex. Let x, y
Domf,
x
f(x) and
y
f(x
0
), then
x

y
, x y 0.
Proof. By the subgradient inequality we have:
x
, y x f(y) f(x)
y
, x y f(x) f(y).
We obtain
x

y
, x y f(x) f(y) (f(x) f(y)) = 0.
The connection between one-sided derivatives and the subdifferential is provided
by (see [59])
Theorem 3.10.4 (Theorem of MoreauPschenitschnii). Let X be a normed space,
f : X R convex. Let f be continuous at x
0
Domf, then for all h X
we obtain
f
t
+
(x
0
, h) = maxu, h [ u f(x
0
) ()
and
f
t
(x
0
, h) = minu, h [ u f(x
0
). ()
Proof. For all u f(x
0
) and all t R
>0
we have by denition
u, th f(x
0
+th) f(x
0
),
and hence
u, h lim
t0
f(x
0
+th) f(x
0
)
t
= f
t
+
(x
0
, h).
On the other hand, let for a h X 0 the linear functional
l(th) := tf
t
+
(x
0
, h)
be dened on V := spanh. Then for all x V we have
l(x) f(x
0
+x) f(x
0
) =: q(x),
Section 3.10 Subgradients 101
since for t R
>0
by the positive homogeneity (see Theorem 3.3.3)
tf
t
+
(x
0
, h) = f
t
+
(x
0
, th) f(x
0
+th) f(x
0
) = q(th).
The subadditivity
0 = f
t
+
(x
0
, th th) f
t
+
(x
0
, th) +f
t
+
(x
0
, th)
implies
tf
t
+
(x
0
, h) = f
t
+
(x
0
, th) f
t
+
(x
0
, th) f(x
0
th) f(x
0
) = q(th).
The function q : X R is convex and 0 Int(Domq). Due to the theorem
of HahnBanach l has a continuous linear extension u such that for all x X
u, x q(x) = f(x
0
+x) f(x)
holds.
Thus u f(x
0
), and for h we obtain
u, h = l(h) = f
t
+
(x
0
, h).
hence (). The relation
f
t
(x
0
, h) = f
t
+
(x
0
, h) = maxu, h [ u f(x
0
)
yields ().
Minimal solutions of a convex function on a subspace can be characterized in the
following way:
Theorem 3.10.5. Let X be a normed space, f : X Ra continuous convex function
and U a subspace of X. Then u
0
M(f, U), if and only if there is a x
0
f(u
0
)
X
such that
x
0
, u u
0
= 0 for all u U.
Proof. Let x
0
f(u
0
) with x
0
, u u
0
= 0 for all u U, then
0 = x
0
, u u
0
f(u) f(u
0
) for all u U.
Conversely let u
0
M(f, U), then for the zero functional on U
, u u
0
= 0 f(u) f(u
0
) for all u U.
Due to the theorem of HahnBanach can be extended to all of X as a continuous
linear functional x
0
in such a way that
x
0
, x u
0
f(x) f(u
0
) for all x X.
By denition then x
0
f(u
0
).
3.11 Conjugate Functions
Denition 3.11.1. Let X be a normed space, f : X R with Domf ,= .
The (convex) conjugate function f
: X
R of f is dened by
f
(y) := supy, x f(x)[x X.

We will also call f
the dual of f.
Being the supremum of convex (afne) functions f
is also convex.
For all x X and all y X
we have by denition
x, y f(x) +f
(y). (3.15)
This relation we will denote as Youngs inequality.
The connection between subgradients and conjugate functions is described by the
following
Theorem 3.11.2. Let X be a normed space, f : X R and f(x) ,= . For
a y X
Youngs equality
f(x) +f
(y) = x, y
holds, if and only if y f(x).
Proof. Let f(x) +f
(y) = x, y. For all u X we have u, y f(u) +f
(y) and
hence f(u) f(x) u x, y i.e. y f(x).
Conversely let y f(x), i.e. for all z X we have: f(z) f(x) z x, y,
hence x, y f(x) z, y f(z) and therefore
x, y f(x) = supz, y f(z) [ z X = f
(y).
In the sequel we will make use of the following statement:
Lemma 3.11.3. Let X be a normed space. Then every lower semi-continuous convex
function f : X R has an afne minorant, more precisely: for each x Domf and
d > 0 there is z X
such that for all y X

f(y) > f(x) +y x, z d.
Proof. Let w.l.o.g. f not identical . Since f is lower semi-continuous and convex,
Epi f is closed, convex and non-empty. Apparently (x, f(x) d) / Epi f. By the
Strict Separation Theorem 3.9.17 there is a (z, ) X
R with
x, z +(f(x) d) > supy, z +r[ (y, r) Epi f.
Section 3.11 Conjugate Functions 103
Since (x, f(x)) Epi f, it follows that d > 0, i.e. < 0. We can w.l.g assume
= 1. For all y Domf we have (y, f(y)) Epi f, hence
x, z (f(x) d) > y, z f(y),
and thus the assertion holds.
Denition 3.11.4. A function f : X R, not identical to , is called proper.
Lemma 3.11.5. A convex lower semi-continuous function f : X R is proper, if
and only if f
is proper.
Proof. If f is proper, then there is x
0
X with f(x
0
) < and hence f
(y) >
for all y X. Due to the previous lemma there is a z X
with
f(x
0
) +x
0
, z +d > supf(y) +y, z [ y X = f
(z),
hence f
is proper. Conversely let f
(z
0
) < for a z
0
X
. From Youngs
inequality it follows that f(x) > for all x X and from f
(z
0
) < follows
the existence of a x
0
X with f(x
0
) < .
Denition 3.11.6. Let X be a normed space and f
: X
Rthe conjugate function

of f : X R. Then we call the function
f
: X R, x f
(x) := supx, x
(x
) [ x
the biconjugate function of f.

Theorem 3.11.7 (FenchelMoreau). Let X be a normed space and f : X R a
proper convex function. Then the following statements are equivalent:
(a) f is lower semi-continuous
(b) f = f
.
Proof. (a) (b): We show at rst Domf
Domf. Let x / Domf. Then

according to the Strict Separation Theorem 3.9.17 there exists a u X
with
x, u > supy, u [ y Domf.
By the above lemma f
is proper. Hence there exists a v X
with f
(v) < . For

all t > 0 we then have
f
(v +tu) = supy, v +tu f(y) [ y X f
(v) +t supy, u [ y Domf.

Thus we obtain
f
(x) x, v +tu f
(v +tu)
x, v f
(v) +t(x, u supy, u [ y Domf)

t
,
i.e. x / Domf
.
Let y X. From the denition of f
it follows for all x
that: f(y)
y, x
(x
) and hence f
(y) f(y) for arbitrary y X.

For y / Domf
we apparently have f
(y) f(y). Suppose, for a x Domf
we have f(x) > f
(x), then (x, f
(x)) / Epi f. Since f is lower semi-continuous,

Epi f is closed. Due to the Strict Separation Theorem 3.9.17 there exists a (x
, )
X
R with
x, x
+af
(x) > supy, x
+ar [ (y, r) Epi f.

Here a < 0 must hold, because a = 0 implies
x, x
> supy, x
[ y Domf =: ,
i.e. x / y X[ y, x
Domf contradicting x Domf
.
The assumption a > 0 would imply x, x
+ af
(x) = , contradicting x
Domf
. Let w.l.o.g. a = 1, i.e.

x, x
(x) > supy, x
r [ (y, r) Epi f
supy, x
f(y) [ y Domf = f
(x
),
contradicting the denition of f
.
(b) (a): Being the supremum of the family , x
(x
) [ x
of
continuous functions, f
is lower semi-continuous.
Remark 3.11.8. If f : X R is a proper convex and lower semi-continuous
function, then by the theorem of FenchelMoreau we have f(x) = supx, x
(x
) [ x
. But each functional , x
(x
) is by denition weakly
lower semi-continuous and hence also f weakly lower semi-continuous.
As a consequence of Theorem 3.11.7 we obtain an extension of Theorem 3.11.2
and thereby a further interpretation of Youngs equality:
Theorem 3.11.9. Let X be a normed space and f : X R a proper convex and
lower semi-continuous function. Then the following statements are equivalent:
(a) x
f(x)
(b) x f
(x
)
Section 3.12 Theorem of Fenchel 105
(c) f(x) +f
(x
) = x, x
.
Proof. Let f(x) + f
(x
) = x, x
. Apparently f
(x
) < . Using Youngs

inequality we have: f(x) + f
(u) x, u for all u X
. Therefore f
(u)
f
(x
) x, u x
, i.e. x f
(x
).
Let now x f
(x
), then for all u X
: f
(u) f
(x
) x, u x
, hence
x, x
(x
) x, u f
(u) and using the theorem of FenchelMoreau we

obtain
x, x
(x
) = supx, u f
(u) [ u X
= f
(x) = f(x).
We will take up the subsequent example for a conjugate function in Chapter 8:
Example 3.11.10. Let : R R be a Young function and let (X, | |) be a normed
space. We consider the convex function f : X R, dened by f(x) := (|x|). For
the conjugate f
: (X
, | |
d
) R we have
f
(y) =
(|y|
d
).
Proof. Using the denition of the norm | |
d
and Youngs inequality it follows that
x, y (|x|) |x| |y|
d
(|x|)
(|y|
d
).
Due to the denition of
and | |
d
there is a sequence (s
n
) in R and a sequence
(x
n
) in X with
(a) s
n
|y|
d
(s
n
)
n

(|y|
d
)
(b) |x
n
| = 1 and [x
n
, y |y|
d
[ <
1
n(1+[s
n
[)
.
Then we obtain
s
n
x
n
, y(|s
n
x
n
|) = s
n
|y|
d
(s
n
)+s
n
(x
n
, y|y|
d
)
n

(|y|
d
).
In particular we have
_
| |
2
2
_
=
| |
2
d
2
.
3.12 Theorem of Fenchel
Let X be a normed space.
Denition 3.12.1. For a concave function g : X [, ) with Domg :=xX[
g(x) > ,= the concave conjugate function g
+
: X
[, ) of g is
dened by
g
+
(y) := infy, x g(x) [ x Domg.
By denition we have the subsequent relation between the concave and the convex
conjugate function
g
+
(y) = (g)
(y). (3.16)
We now consider the optimization problem inf(f g)(X), where f g : X
R are assumed to be convex. Directly from the denitions it follows that for
all x X and y X
the inequality
f(x) +f
(y) y, x g(x) +g
+
(y)
holds, i.e.
inf(f g)(x) [ x X sup(g
+
f
)(y) [ y X
. (3.17)
As a direct application of the Separation Theorem of Eidelheit 3.9.14 we obtain the
Theorem 3.12.2 (Theorem of Fenchel). Let f, g : X R be convex func-
tions and let a x
0
Domf Domg exist such that f or g are continuous at x
0
. Then
we obtain
inf(f g)(x) [ x X = sup(g
+
f
)(y) [ y X
.
If in addition inf(f g)(x) [ x X is nite, then on the right-hand side the supre-
mum is attained at a y
0
X
.
Proof. W.l.g. let f be continuous at x
0
. Then we have x
0
Domf and
f(x
0
) g(x
0
) inf(f g)(x) [ x X =: .
The assertion of the theorem is for = a direct consequence of Inequality (3.17).
Under the assumption < < we consider the sets
A := (x, t) X R[ x Domf, t > f(x)
B := (x, t) X R[ t g(x) +.
The above sets are convex and disjoint. From the continuity of f at x
0
it follows
that Int(A) ,= . Due to the separation theorem of Eidelheit there is a hyperplane H
separating A and B, i.e. there is a (0, 0) ,= (y, ) X
R and r R, such that for

all (x
1
, t
1
) A and all (x
2
, t
2
) B we have
y, x
1
+t
1
r y, x
2
+t
2
. (3.18)
It turns out that ,= 0, because otherwise for all x
1
Domf and all x
2
Domg
y, x
1
r y, x
2
.
Section 3.12 Theorem of Fenchel 107
Since x
0
Domf Domg and x
0
Int(Domf) it follows on a neighborhood U of
the origin
y, x
0
+U r = y, x
0
,
hence y, U 0, contradicting (0, 0) ,= (y, ).
Moreover < 0, because otherwise supt [ (x
0
, t) A = in contradiction to
Inequality (3.18).
Let now > 0. For :=
r
and w :=
y
and all x Domf it follows from

(x, f(x) +) A that
w, x (f(x) +) .
Since is arbitrary, we have f
(w) . Let z Domg. Apparently (z, g(z) +)

B and hence by Inequality (3.18)
w, z (g(z) +)
yielding g
+
(w) +. Together with Inequality (3.17) it follows that
g
+
(w) g
+
(w) f
(w) supg
+
(y) f
(y) [ y X
inff(x) g(x) [ x X = ,
and thereby the assertion.
As a consequence of the theorem of Fenchel we obtain the
Theorem 3.12.3. Let X be a normed space, K X convex, f : X R
convex and continuous at a point

k K. Then k
0
M(f, K), if and only if for a
u X
(a) f
(u) +f(k
0
) = u, k
0
(b) u, k
0
= min(u, K)
holds.
Proof. Let k
0
M(f, K) and let at rst k
0
/ M(f, X). Let
g(x) =
_
0 for x K
otherwise.
For y X
we have g
+
(y) = infy, k [ k K. By the theorem of Fenchel there
exists a u X
with
infu, k [ k K f
(u) = f(k
0
).
It turns out that u ,= 0, because otherwise due to f
(0) = f(k
0
) we have by The-
orem 3.11.2 0 f(k
0
), hence apparently k
0
M(f, X), a contradiction to our
assumption. By Youngs inequality we then obtain
infu, k [ k K = f
(u) +f(k
0
) u, k
0
infu, k [ k K,
hence (a) and (b).
If k
0
M(f, X), then for u = 0 we have (a) by denition of f
and apparently
also (b).
Conversely (a) implies using Theorem 3.11.2 that u f(k
0
). Hence u, k
k
0
f(k) f(k
0
) for all k K. By (b) we then have f(k) f(k
0
) for all
k K.
As another application of the theorem of Fenchel we obtain the duality theorem of
linear approximation theory in normed spaces:
Theorem 3.12.4. Let X be a normed space, V a subspace of X and z X, then
inf|z v| [ v V = maxu, z [ |u| 1 and u V
.
Proof. Let f(x) := |x| and
g(x) =
_
0 for x z V
otherwise.
Then we have
f
(u) =
_
0 for |u| 1
otherwise
and
g
+
(u) = infu, x [ x z V =
_
u, z for u V
otherwise.
Application 3.12.5 (Formula of Ascoli). We treat the following problem:
In a normed space X the distance of a point z X to a given hyperplane has
to be computed. If for a u X
and a x
0
X the hyperplane is represented by
H = x X[ u, x = 0 + x
0
, then one obtains the distance via the formula of
Ascoli
[ u, z u, x
0
[
| u|
, (3.19)
because due to the previous theorem we have for V =: x X[ u, x = 0:
inf|z x
0
v| [ v V = maxu, z x
0
[ |u| 1 and u, v = 0 v V .
As the restriction set of the maximum problem only consists of
u
| u|
, the assertion
follows.
Section 3.13 Existence of Minimal Solutions for Convex Optimization 109
Theorem 3.12.3 leads to the following characterization of a best linear approxima-
tion:
Theorem 3.12.6 (Singer). Let X be a normed space, V a subspace of X and x X.
The point v
0
V is a best approximation of x w.r.t. V , if and only if for a u X
0
we have
(a) |u| = 1
(b) u, v = 0 v V
(c) u, x v
0
= |x v
0
|.
Proof. Let h f(h) := |h|, K := x V , then
f
(u) =
_
0 for |u| 1
otherwise.
From (a) in Theorem 3.12.3 both (a) and (c) follow, from (b) in Theorem 3.12.3 -
nally (b).
3.13 Existence of Minimal Solutions for Convex
Optimization
Denition 3.13.1. The problem maximize (g
+
f
) on X
we call the Fenchel-dual

problem to minimize (f g) on X.
Using the theorem of Fenchel we obtain the following principle for existence
proofs:
If an optimization problemwith nite values is the Fenchel dual of another problem,
then it always has a minimal solution (resp. maximal solution).
A natural procedure for establishing existence for a given optimization problem is
to formthe dual twice, hoping in this way to return to the original problem. In doing so
we encounter a formal difculty that, by twice forming the dual, a problem in (X
results, and not in X. One can always view X as a subset of (X
, by considering
the following mapping E : X
(X
, where for a x X the functional E(x) :

X
R is dened by
x
E(x)(x
) = E(x), x
:= x
, x.
Of particular interest are due to what has just been said those normed spaces X,
for which all elements in (X
can be obtained in this way, because then (X
can
be identied with X. This leads to the
Denition 3.13.2. A normed space X is called reexive, if E(X) = (X
holds.
Remark 3.13.3. Let X be a reexive Banach space and f : X R. For the second
conjugate (f
: X
R and the biconjugate f
: X R we have for all x X

(f
(E(x)) = supE(x), x
(x
) [ x
= supx
, x f
(x
) [ x
= f
(x).
Lemma 3.13.4. Let K be a convex subset of a normed space X and let f : K R
be a continuous convex function. Then f is bounded from below on every bounded
subset B of K.
Proof. Let a > 0 and x
0
K. As f is continuous, there is an open ball K(x
0
, r)
with radius 1 > r > 0 such that
f(x) > f(x
0
) a
holds for all x K(x
0
, r). Let M > 1 be chosen such that K(x
0
, M) B. Let
y B be arbitrary and z := (1
r
M
)x
0
+
r
M
y. Then it follows that
|z x
0
| =
r
M
|y x
0
| < r,
i.e. z K(x
0
, r). Since f is convex, we have
f(z)
_
1
r
M
_
f(x
0
) +
r
M
f(y),
and therefore
f(y)
M
r
_
1
r
M
_
f(x
0
)+
M
r
f(z)
_
1
M
r
_
f(x
0
)+
M
r
(f(x
0
)a) =: c.
Theorem 3.13.5 (MazurSchauder). Let X be a reexive Banach space, K a non-
empty, closed, bounded and convex subset of X. Then every continuous convex func-
tion h : K R has a minimal solution.
Proof. Let f, g : X (, ] be dened by
f(x) =
_
h(x) for x K
otherwise,
and
g(x) =
_
0 for x K
otherwise.
Then for all y X
it follows that
g
+
(y) = infy, x g(x) [ x X = infy, x [ x K.
Section 3.14 Lagrange Multipliers 111
Since K is bounded, there is a r > 0, such that we have for all x K: |x| r
and hence [y, x[ |y||x| r|y|. Therefore g
+
(y) r|y| > . Being the
inmum of continuous linear functions , x [ x K g
+
is also continuous on the
Banach space X
(see Theorem 5.3.13). Due to Lemma 3.11.5 f
is a proper convex
function, i.e. Domf
,= . Therefore the preconditions for treating the problem

inff
(y) g
+
(y) [ y X
=:
are satised. Using the theorem of Fenchel and the reexivity of X it follows that
= sup(g
+
)
+
(E(x)) (f
(E(x)) [ x X. (3.20)
For all x X we have by Remark 3.13.3 (f
(E(x)) = f
(x). The continu-

ity of h on K implies the lower semi-continuity of f on X and by the theorem of
FenchelMoreau 3.11.7 we obtain f
(x) = f(x). Putting z = E(x) we obtain (see

Equation (3.16))
(g
+
)
+
(z) = [(g)
]
+
(z) = [(g)
(z) = (g)
(z),
and due to the lower semi-continuity of g we obtain again using the theorem of
FenchelMoreau:
(g
+
)
+
(E(x)) = (g)
(E(x)) = g(x).
By Equation (3.20) we obtain
= supg(x) f(x) [ x X = supf(x) [ x K = inff(x) [ x K.
By Lemma 3.13.4 the number is nite. The theorem of Fenchel then implies
the existence of a x, where the supremum in (3.20) is attained, i.e. f( x) = =
inff(x) [ x K.
3.14 Lagrange Multipliers
In this section we consider convex optimization problems, for which the restrictions
are given as convex inequalities (see [79]).
Denition 3.14.1. Let P be a convex cone in a vector space X. For x, y X we
write x y, if x y P. The cone P dening this relation is called positive cone
in X.
In a normed space X we dene the convex cone dual to P by
P
:= x
[ x, x
0 for all x P.
Denition 3.14.2. Let be a convex subset of a vector space X and (Z, P) an or-
dered vector space. Then G : (Z, P) is called convex, if
G(x
1
+ (1 )x
2
) G(x
1
) + (1 )G(x
2
)
for all x
1
, x
2
and all [0, 1].
We now consider the following type of problem:
Let be a convex subset of the vector space X, f : R a convex function,
(Z, ) an ordered vector space and G : (Z, ) convex.
Then minimize f(x) under the restriction
x , G(x) 0.
In the sequel we want to develop the concept of Lagrange multipliers. For this
purpose we need a number of preparations.
Let := z [ x G(x) z. It is easily seen that is convex. On we dene
a function w : R by z inff(x) [ x , G(x) z. Then w(0) is
the minimal value of f on x , G(x) 0.
Theorem 3.14.3. The function w is convex and monotonically decreasing, i.e. for all
z
1
, z
2
we have
z
1
z
2
w(z
1
) w(z
2
).
Proof. Let [0, 1] and let z
1
, z
2
, then
w(z
1
+ (1 )z
2
) = inff(x) [ x , G(x) z
1
+ (1 )z
2
inff(x) [ x
1
, x
2
: x = x
1
+ (1 )x
2
, G(x
1
) z
1
, G(x
2
) z
2
= inff(x
1
+ (1 )x
2
) [ x
1
, x
2
: G(x
1
) z
1
, G(x
2
) z
2
inff(x
1
) + (1 )f(x
2
) [ x
1
, x
2
: G(x
1
) z
1
, G(x
2
) z
2
= inff(x
1
) [ x
1
: G(x
1
) z
1
+ (1 ) inff(x
2
) [ x
2
: G(x
2
) z
2
= w(z
1
) + (1 )w(z
2
).
This proves the convexity of w, the monotonicity is obvious.
We start our consideration with the following assumption: there is a non-vertical
supporting hyperplane of Epi(w) at (0, w(0)), i.e. there is a z
0
Z
with
z
0
, z +w(0) w(z)
for all z . Then apparently w(0) z
0
, z + w(z) holds. For all x also
G(x) holds. Therefore we have for all x
w(0) w(G(x)) +z
0
, G(x) f(x) +z
0
, G(x).
Section 3.14 Lagrange Multipliers 113
Hence
w(0) inff(x) +z
0
, G(x) [ x .
If z
0
can be chosen, such that z
0
0, then we have for all x
G(x) 0 z
0
, G(x) 0,
and thus
inff(x) +z
0
, G(x) [ x inff(x) +z
0
, G(x) [ x , G(x) 0
inff(x) [ x , G(x) 0
= w(0).
We nally obtain
w(0) = inff(x) +z
0
, G(x) [ x ,
and conclude that we have thus converted an optimization problem with restriction
G(x) 0 into an optimization problem without restrictions.
Theorem 3.14.4. Let X be a vector space, (Z, P) an ordered normed space with
Int(P) ,= and let be a convex subset of X. Let f : R and G : Z
convex mappings. Let further the following regularity conditions be satised:
(a) there exists a x
1
with G(x
1
) Int(P)
(b)
0
:= inff(x) [ x , G(x) 0 is nite.
Then there is z
0
P
with
0
= inff(x) +z
0
, G(x) [ x . (3.21)
If the inmum in (b) is attained at a x
0
with G(x
0
) 0, the inmum in (3.21) is
also attained and the equation z
0
, G(x
0
) = 0 holds.
Proof. In the space W := R Z we dene the following sets:
A := (r, z) [ there is x : r f(x), z G(x)
B := (r, z) [ r
0
, z 0.
Since P, f and G are convex, the sets A and B are convex sets. From the deni-
tion of
0
it follows that A contains no interior point of B. Since P does contain
interior points, the interior of B is non-empty. Then by the Separation Theorem of
Eidelheit 3.9.14 there is a non-zero functional w
0
= (r
0
, z
0
) W
such that
r
0
r
1
+z
1
, z
0
r
0
r
2
+z
2
, z

for all (r
1
, z
1
) A and all (r
2
, z
2
) B. The properties of B immediately imply
w
0
0 i.e. r
0
0 and z
0
0. At rst we show: r
0
> 0. Apparently we have
(
0
, 0) B, hence
r
0
r +z, z
0
r
0
0
for all (r, z) A. Suppose r
0
= 0, then in particular G(x
1
), z
0
0 and z
0
,= 0.
Since however G(x
1
) is an interior point of P and z
0
0 it then follows that
G(x
1
), z
0
< 0, a contradiction. Thus r
0
> 0 and we can w.l.o.g. assume r
0
= 1.
Let now (x
n
) be a minimizing sequence of f on x [ G(x) 0, i.e. f(x
n
)
0
. We then obtain because of (f(x), G(x)) A for all x
0
infr+z
0
, z) [ (r, z) A inff(x)+z
0
, G(x) [ x f(x
n
)
0
,
and thus the rst part of the assertion.
If now there is a x
0
with G(x
0
) 0 and f(x
0
) =
0
, then
0
f(x
0
) +G(x
0
), z
0
f(x
0
),
and hence G(x
0
), z
0
= 0.
The following theorem about Lagrange duality is a consequence of the previous
theorem:
Theorem 3.14.5. Let X be a vector space, (Z, P) an ordered normed space with
Int(P) ,= and let be a convex subset of X. Let f : R and G : Z
convex mappings. Moreover, let the following regularity conditions be satised:
(a) there exists a x
1
with G(x
1
) Int(P)
(b)
0
:= inff(x) [ x , G(x) 0 is nite.
Then
inff(x) [ x , G(x) 0 = max(z
) [ z
, z
0,
where (z
) := inff(x) +z
, G(x) [ x for z
, z
0.
If the maximum on the right-hand side is attained at z
0
Z
with z
0
0 and
also the inmum at a x
0
, then z
0
, G(x
0
) = 0 holds, and x
0
minimizes x
f(x) +z
0
, G(x) : R on .
Proof. Let z
with z
0. Then we have
(z
) = inff(x) +z
, G(x) [ x
inff(x) +z
, G(x) [ x , G(x) 0
inff(x) [ x , G(x) 0 =
0
.
Hence
sup(z
) [ z
, z
0
0
.
The previous theoremimplies the existence of an element z
0
, for which equality holds.
The remaining part of the assertion follows also by the previous theorem.
Chapter 4
Numerical Treatment of Non-linear Equations and
Optimization Problems
The treatment of numerical methods in this chapter follows to a large extent that
in [60]. Due to its theoretical and practical signicance we will start with the Newton
method, which can be considered as a prototype of all rapidly convergent methods.
The Newton method is primarily a method of determining the solution of a non-linear
equation. For minimization problems this equation results from setting the derivative
of the function to be minimized to zero. For n-dimensional minimization problems
this yields a system of n non-linear equations with n unknowns. The idea of the
Newton method can be described in the following way: starting with an initial guess
the successor is determined as the solution of the linearized equation at the present
position. In one dimension we obtain the following gure:
20 15 10 5 0 5 10
1000
0
1000
2000
3000
4000
5000
Linearization of a nonlinear function
Let
F : R
n
R
n
be a continuously differentiable function, for which we want to determine a x
R
n
with F(x
) = 0. Then for the linearized equation at x we obtain

F( x) +F
t
( x)(x x) = 0.
116 Chapter 4 Numerical Treatment of Non-linear Equations and Optimization Problems
If one wants to minimize a function f : R
n
R, then for the corresponding equation
one has
f
t
( x) +f
tt
( x)(x x) = 0.
For the general case one obtains the following algorithm:
4.1 Newton Method
Let
F : R
n
R
n
be a continuously differentiable function.
(a) Choose a starting point x
0
, a stopping criterion, and put k = 0.
(b) Compute x
k+1
as a solution of the Newton equation
F(x
k
) +F
t
(x
k
)(x x
k
) = 0.
(c) Test the stopping criterion for x
k+1
and if not satised increment k by 1 and
continue with (b).
Concerning the convergence of the Newton method we mention the following: if the
function F has a zero x
, and if the Jacobian matrix F

t
(x
) is invertible, then there

is a neighborhood U of x
, such that for every choice of a starting point in U the

Newton equation is solvable for each k, the total iteration sequence (x
k
)
kN
0
remains
in U and converges to x
. A characteristic property of the Newton method is its

rapid convergence. Under the assumptions mentioned above the convergence is Q-
superlinear, i.e. the sequence of the ratios
|x
k+1
x
|
|x
k
x
|
(4.1)
converges to zero [60]. For twice differentiable F (Lipschitz-continuity of F
t
in x
)
one even has quadratic convergence, i.e. for a constant C > 0 one has for all k N
0
|x
k+1
x
| C|x
k
x
|
2
.
The Newton method can be considered to be the prototype of a rapidly convergent
method in the following sense: a convergent sequence in R
n
, for which F
t
is regular
at the limit, is Q-superlinearly convergent to a zero of F, if and only if it is Newton-
like, i.e.
|F(x
k
) +F
t
(x
k
)(x
k+1
x
k
)|
|x
k+1
x
k
|
k
0.
In other words: all rapidly convergent methods satisfy the Newton equation up to
o(|x
k+1
x
k
|). However, Newton-like sequences can also be generated without
computing the Jacobian matrix.
Section 4.1 Newton Method 117
Disadvantages of the Newton method consist in the fact that the method is only
locally convergent and at each iteration step one has to
compute the Jacobian matrix
solve a linear system of equations.

We will as we proceed encounter rapidly convergent (Newton-like) methods,
which do not display these disadvantages. The two latter disadvantages mean in
particular for large dimensions a high computational effort. This is also the case
if the Jacobian matrix is approximated by a suitable matrix of difference quotients,
where in addition rounding errors create problems.
Application to Linear Modular Approximation
Let : R R
0
be a twice continuously differentiable Young function. Let x R
m
and let V be an n-dimensional subspace of R
m
with a given basis v
1
, . . . , v
n
. The
approximation y
to be determined is a linear combination

n
i=1
a
i
v
i
of the basis
vectors, where a
1
, . . . , a
n
are the real coefcients to be computed. The search
for a best approximation corresponds thus to a minimization problem in R
n
with the
unknown vector a = (a
1
, . . . , a
n
). The function to be minimized is
f : R
n
R
a f(a) := f
_
x
n
i=1
a
i
v
i
_
.
The minimization of this function we denote as (linear) modular approximation. Set-
ting the rst derivative of the function f to zero leads to the following system of
equations (compare Equation (1.15)):
tT
v
i
(t)
t
_
x(t)
n
l=1
a
l
v
l
(t)
_
= 0, for i = 1, . . . , n.
For the elements of the matrix of the second derivative of f one obtains
2
a
i
a
j
f(a) =
tT
v
i
(t)v
j
(t)
tt
_
x(t)
n
l=1
a
l
v
l
(t)
_
.
The computational complexity of determining the Hessian matrix is apparently
n
2
times as high as the effort for evaluating the function f. In particular for large
n it is urgent to decrease the computational effort.
4.2 Secant Methods
The above difculties can be avoided by employing the so-called secant methods
(Quasi-Newton methods). Here the Jacobian matrix at x
k
(for minimization the Hes-
sian matrix) is replaced by a matrix B
k
, which is easily determined. If B
k
is given,
then x
k+1
is determined as the solution of the equation
F(x
k
) +B
k
(x x
k
) = 0.
If the inverse of B
k
is available we have
x
k+1
= x
k
B
1
k
F(x
k
).
For the successor B
k+1
one requires that the secant equation
B
k+1
(x
k+1
x
k
) = F(x
k+1
) F(x
k
)
is satised. The solutions of the secant equation form an afne subspace of the space
of n n-matrices. For a concrete algorithmic realization of the computation of B
k+1
an update formula is chosen, which determines, how B
k+1
is obtained from B
k
. The
sequence of the matrices (B
k
)
kN
0
are called update matrices. A particular advantage
of this scheme is that even update formulas for the inverse B
1
k+1
are available and
thus the solution of the linear system of equations can be replaced by a matrix-vector
multiplication.
Newton-likeness and hence rapid convergence of the secant method is obtained if
(B
k+1
B
k
)
k
0
holds, due to the subsequent
Theorem 4.2.1. A secant method converging to x
R
n
with invertible matrix
F
t
(x
) and
B
k+1
B
k
k
0
converges Q-superlinearly and F(x
) = 0 holds.
Before we begin with the proof we want to draw a connection between the secant
and the Newton method: according to the mean value theorem in integral form we
have for
Y
k+1
:=
_
1
0
F
t
(x
k
+t(x
k+1
x
k
))dt
the equation
Y
k+1
(x
k+1
x
k
) = F(x
k+1
) F(x
k
),
i.e. Y
k+1
satises the secant equation. One can verify directly that convergence of a
sequence (x
k
)
kN
0
and the Lipschitz continuity of F
t
implies the convergence of the
sequence (Y
k
)
kN
0
to the Jacobian matrix at the limit x
.
Section 4.2 Secant Methods 119
This leads to the following
Proof. Let s
k
= x
k+1
x
k
,= 0 for all k N
0
. Due to (B
k+1
Y
k+1
)s
k
= 0 and
F(x
k
) = B
k
s
k
the Newton-likeness follows:
|F(x
k
) +F
t
(x
k
)s
k
|
|s
k
|
=
| B
k
s
k
+F
t
(x
k
)s
k
|
|s
k
|
=
|(F
t
(x
k
) Y
k+1
+B
k+1
B
k
)s
k
|
|s
k
|
|F
t
(x
k
) Y
k+1
| +|B
k+1
B
k
|
k
0,
provided that F
t
is Lipschitz continuous.
Update Formula of Broyden
A particularly successful update formula in practice is the following formula intro-
duced by C. G. Broyden
B
k+1
= B
k
+
(y
k
B
k
s
k
)s
T
k
s
T
k
s
k
where s
k
:= x
k+1
x
k
and y
k
:= F(x
k+1
) F(x
k
). The success of the Broyden for-
mula can be understood in view of the previous theorem, because it has the following
minimality property: among all matrices B that satisfy the secant equation
Bs
k
= y
k
the Broyden matrix B
k+1
has the least distance to its predecessor B
k
in the Frobenius
norm. For a practical realization the inverse update formula
H
k+1
= H
k
+
(s
k
H
k
y
k
)s
T
k
H
k
s
T
k
H
k
y
k
can be used, which allows the computation of B
1
k+1
= H
k+1
from B
1
k
= H
k
. This
formula is obtained from the general ShermanMorrisonWoodbury Lemma, which
guarantees the existence of the inverse and the validity of the above formula for non-
vanishing denominator.
Geometry of Update Matrices
In order to understand the Broyden method and the subsequent algorithms, we want
to formulate a general geometric statement about successive orthogonal projections
on afne subspaces.
Theorem 4.2.2. Let (V
n
)
nN
0
be a sequence of afne subspaces of a nite-dimen-
sional vector space X, equipped with a scalar product and the corresponding norm.
Let there exist a H
k
V
k
for all k N such that
k=0
|H
k+1
H
k
| < .
Then for every starting point B
0
X the sequence dened by the recursion
B
k+1
is best approximation of B
k
w.r.t. V
k
is bounded and we have B
k+1
B
k

k
0.
In order to describe the geometry of the Broyden method, one chooses X as the
space of matrices, equipped with the Frobenius norm. As afne subspace in this
context we dene V
k
:= B[ Bs
k
= y
k
and for H
k
the mean value matrix Y
k+1
for
k N is chosen.
4.3 Global Convergence
The class of rapidly convergent methods corresponds, as we have seen, to the class of
the convergent Newton-like methods. In order to ensure global convergence, we take
up the idea of the classical gradient methods. The gradient method is a method for
determining the minimal solution of functions, which displays global convergence for
a large class of functions. The main idea of the gradient method relies on the following
descent property: starting from the present position one can in the direction of the
negative gradient nd a point with a lower function value. The following iteration
x
k+1
= x
k

k
f(x
k
) for f(x
k
) ,= 0
with suitable step size
k
is based on this property. The determination of a step size
is an important issue and is provided by use of a step size rule. Instead of the direc-
tion of the negative gradient f(x
k
) one can use a modied direction d
k
with descent
property. In this way we will be enabled to construct a rapidly as well as globally con-
vergent method. The pure gradient method, however, is only linearly convergent. This
leads to the generalized gradient methods, which after choice of a starting point x
0
are dened by the following iteration
x
k+1
= x
k

k
d
k
as soon as step size
k
and search direction d
k
are specied. Damped versions of the
Newton method will also belong to this class, which we will treat below. At rst we
want to turn our attention to the determination of the step size.
Section 4.3 Global Convergence 121
Some Step Size Rules
The most straightforward way of determining a step size is, to search for the point with
least function value in the chosen direction. This leads to the following minimization
rule (Rule of optimal step size):
Choose
k
, such that
f(x
k

k
d
k
) = min
0
f(x
k
d
k
).
This rule has a crucial disadvantage: the step size can in general be determined only
by an innite process. Therefore this rule can only be realized on a computer in an
approximate sense. Experience shows that innite rules can be replaced by construc-
tive nite rules. A common property of these rules is, to keep the ratio of the slopes
of secant and tangent
_
f(x
k
) f(x
k

k
d
k
)
k
_
_
f(x
k
)
T
d
k
above a xed constant > 0. The simplest version is obtained by the following:
Armijo Rule (AR)
Let (0, 1) and (0,
1
2
). Choose
k
=
m
k
, such that m
k
is the smallest
number N
0
, for which
f(x
k
) f(x
k

m
k
d
k
)
m
k
f(x
k
)
T
d
k
,
i.e. one starts with step size 1 and decreases the step size by multiplication with
until the ratio of the slopes of secant and tangent exceeds .
In the early phase of the iteration, the step size 1 accepted by the Armijo rule can be
unnecessarily small. This problem can be eliminated by the following modication:
Armijo Rule with Expansion (ARE)
Let (0,
1
2
) and (0,
1
2
). Choose
k
=
m
k
such that m
k
is the smallest integer
for which
f(x
k
) f(x
k

m
k
d
k
)
m
k
f(x
k
)
T
d
k
.
Algorithmically one proceeds in the following way: if the required inequality is al-
ready satised for = 1, then, different from the Armijo rule, the search is not
discontinued. Through multiplication by 1/ the expansion is continued, until the
inequality is violated. The predecessor then determines the step size.
A different procedure prevents the step size from becoming too small. This can be
achieved by the requirement, that the ratio of the slopes of secant and tangent should
not come too close to the number 1.
Goldstein Rule (G)
Let (0,
1
2
). Choose
k
, such that
1
f(x
k
) f(x
k

k
d
k
)
k
f(x
k
)
T
d
k
.
For a concrete realization of this rule one can proceed in a similar way as in the Armijo
rule, using an interval partitioning method for the second inequality. In any case the
acceptance of the full step size (
k
= 1) should be tested.
PowellWolfe Rule (PW)
Let (0,
1
2
) and (0, 1). Choose
k
, such that
(a)
f(x
k
)f(x
k
k
d
k
)
k
f(x
k
)
T
d
k

(b) f(x
k

k
d
k
)
T
d
k
f(x
k
)
T
d
k
.
A concrete realization of this rule provides an expansion phase similar in spirit to
ARE.
These constructive rules have the additional advantage that in the subsequent glob-
alization of the Newton method the damped method automatically will merge eventu-
ally into an undamped method. Here (and also for Newton-like directions (see [60],
p. 120, 128) for sufciently large k the step size
k
= 1 will be accepted. One can
show that the ratio of the slopes of secant and tangent converges to 1/2.
Remark 4.3.1. For the determination of a zero of a function
F : R
n
R
n
,
which is not derivative of a function f : R
n
R (or where f is unknown), the
determination of the step size can be performed by use of an alternative function as
e.g. x h(x) =
1
2
|F(x)|
2
.
4.3.1 Damped Newton Method
The pure Newton method is, as mentioned above, only locally, but on the other hand
rapidly convergent. Damped Newton methods, whose algorithms we will discuss
below, combine the advantages of rapid convergence of the Newton method and the
global convergence of the gradient method. As indicated above we have to distinguish
the determination of minimal solutions of a function f : R
n
R and the determina-
tion of a zero of a function F : R
n
R
n
which is not necessarily the derivative of a
real function on the R
n
. For the rst of the two cases we have the following algorithm:
Damped Newton Method for Minimization
Let
f : R
n
R
be a twice continuously differentiable function.
0
, a stop criterion and put k 0.
(b) Test the stop criterion w.r.t. x
k
. Compute a new direction d
k
as solution of the
linear equation systems
f
tt
(x
k
)d = f
t
(x
k
).
(c) Determine x
k+1
:= x
k
k
d
k
, where
k
is chosen according to one of the nite
step size rules described above.
(d) Put k k + 1 and continue with step (b).
For this method the following convergence statement holds:
n
R be a twice continuously differentiable function and
x
0
R
n
a starting point. Let the level set belonging to the starting point
y R
n
[ f(y) f(x
0
)
be convex, bounded, and the second derivative be positive denite there. Then the
damped Newton method converges to the uniquely determined minimal solution of f
and the damped Newton method merges into the (undamped) Newton method, i.e.
there is a k
0
N such that for all k k
0
the step size
k
= 1 is accepted by the rule
chosen.
For the determination of a zero of an arbitrary function F : R
n
R
n
we introduce
a function x h(x) :=
1
2
|F(x)|
2
as descent function. This leads to following
algorithm:
Damped Newton Method for Equations
0
, a stop criterion and put k 0.
(b) Test the stop criterion for x
k
. Compute a new direction d
k
as solution of the
linear equation systems
F
t
(x
k
)d = F(x
k
).
(c) Determine x
k+1
:= x
k
k
d
k
, where
k
is chosen according to one of the nite
step size rules described above.
(d) Put k k + 1 and continue with step (b).
Remark 4.3.3. The nite step size rules test the ratio of the slopes of secant and
tangent. For the direction d
k
= F
t
(x
k
)
1
F(x
k
) the slope of the tangent can be
expressed in terms of h(x
k
), since we have
h
t
(x
k
), d
k
= F
t
(x
k
)
T
F(x
k
), F
t
(x
k
)
1
F(x
k
) = F(x
k
), F(x
k
) = 2h(x
k
).
For the step size
k
we obtain the following simple condition:
h(x
k

k
d
k
)
h(x
k
)
1 2
k
. (4.2)
The convergence is described by the following theorem (see [60], p. 131):
Theorem 4.3.4. Let F : R
n
R
n
be a continuously differentiable function and let
h : R
n
Rbe dened by h(x) =
1
2
|F(x)|
2
. Let x
0
be a starting point with bounded
level set
S
h
(x
0
) := x R
n
[ h(x) h(x
0
)
and let F
t
(x) be invertible for all x S
h
(x
0
). Let further F
t
be Lipschitz-continuous
in a ball K S
h
(x
0
).
Then the damped Newton method for equations converges quadratically to a zero
of F. Furthermore this method merges into the (proper) Newton method.
4.3.2 Globalization of Secant Methods for Equations
The central issue of the secant methods was to avoid the computation of the Jacobian
matrix. Global convergence in the previous section was achieved by use of a step
size rule. These rules employ the derivative of the corresponding descent function.
In the case of the determination of a zero of a function F : R
n
R and the use of
h : R
n
R where h(x) =
1
2
|F(x)|
2
as descent function the derivative h
t
(x) =
F
t
(x)
T
F(x) contains the Jacobian matrix F
t
(x). Thus for secant methods the step
size rules w.r.t. h are not available. As a substitute one can use the ratio of subsequent
values of h (compare Equation (4.2) for the Newton direction) as a criterion. At rst
the full step size is tested. If not successful, one can try a few smaller step sizes.
If in this way the direction of the secant is not found to be a direction of descent,
an alternative direction is constructed. We will now describe a number of methods,
which differ by the choice of the alternative direction.
SJN method
Here the Newton direction is chosen as a substitute. Since here the Jacobian matrix is
computed, we can use it as a new start for update matrices. We denote the resulting
method as Secant method with Jacobian matrix New start (SJN).
The SJN-method
(a) Choose a regular n n-matrix B
0
, x
0
R
n
, (0, 1), (0, 1/4) and a
formula for the determination of update matrices for a secant method (resp. for
its inverse). Put k = 0 and choose a stop criterion.
(b) If the stop criterion is satised, then Stop.
(c) Put d
k
:= B
1
k
F(x
k
) and search for
k
R, such that h(x
k

k
d
k
)
(1 )h(x
k
), where at rst
k
= 1 is tested. If the search is successful, put
x
k+1
= x
k

k
d
k
and goto (e).
(d) New start: Compute F
t
(x
k
)
1
, put B
1
k
= F
t
(x
k
)
1
and d
k
= B
1
k
F(x
k
).
Find the smallest m N
0
(Armijo rule) with
h(x
k

m
d
k
) (1
m
)h(x
k
) (4.3)
and put x
k+1
= x
k

m
d
k
.
(e) Put s
k
= x
k+1
x
k
and y
k
= F(x
k+1
) F(x
k
). Compute B
k+1
resp. B
1
k+1
by the update formula chosen above.
(f) Put k k + 1 and goto (b).
Theorem 4.3.5. Let F : R
n
R
n
be continuously differentiable, F
t
be locally
Lipschitz continuous, x
0
R
n
, and let the level set S
h
(x
0
) belonging to the starting
point be bounded. Further let x
be the only zero of F and let F

t
(x) be regular for
all x S
h
(x
0
). Then the sequence (x
k
)
kN
0
generated by the SJN-method is at least
Q-superlinearly convergent. Moreover, from a certain index k
0
N on, the step size
k
= 1 is accepted.
Derivative-free Computation of Zeros
The disadvantage of the choice of the alternative direction described in the previous
section is that the Jacobian matrix has to be computed. On the other hand by making
the Jacobian matrix available we have achieved two goals:
(a) Necessary and sufcient for rapid convergence is the Newton-likeness of the
iteration sequence (see [60], p. 71). In this sense the New start provides an
improvement of the current update matrix.
(b) The Newton direction is always a direction of descent w.r.t. h.
These two goals we will now pursue using derivative-free methods. The geometry of
the update matrices will simplify our understanding of the nature of the problem and
will lead to the construction of concrete algorithms.
A possibility for exploiting the geometry, can be obtained in the following way:
in order to construct a new update matrix, which if possible should approximate
the Jacobian matrix, we introduce as a substitute an articial subspace and deter-
mine a new update matrix by projection of the current matrix onto this subspace.
A straightforward strategy is to choose the substitute to be orthogonal to the current
subspace V
k
. Instead of the orthogonality of the subspaces in the subsequent algorith-
mic realization we construct mutually orthogonal vectors.
The OS method
(a) Choose a regular n n-matrix B
0
, x
0
R
n
, C,

C (0, 1), > 0 and a
formula for the determination of update matrices for a secant method (resp. for
its inverse). Put k = 0 and choose a stop criterion.
(b) Put N = 0. If the stop criterion is satised, then Stop.
(c) Put d
k
:= B
1
k
F(x
k
) and search for
k
R satisfying h(x
k

k
d
k
)
Ch(x
k
), where at rst
k
= 1 is tested. If the search is successful, put
x
k+1
:= x
k

k
d
k
s
k
:=
k
d
k
y
k
:= F(x
k+1
) F(x
k
)
and goto (e).
(d) New start: Put x
k+1
:= x
k
and N N + 1. Let be the last step size
tested in (c) and let d :=

N
2
d
k
. If k = 0, we put y
k
:= F(x
k
d) F(x
k
).
Otherwise compute t := d
T
s
k1
/ s
T
k1
(d s
k1
) and put s
k
:= t s
k1
+(1t)d,
y
k
:= F(x
k
s
k
) F(s
k
).
(e) Compute B
k+1
resp. B
1
k+1
by the update formula chosen above by use of s
k
and
y
k
.
(f) Put k k + 1. In case a New start was necessary, goto (c), otherwise to (b).
Remark. In case of a multiple New start an inner loop with loop index N comes into
play. The desired effect is the asymptotic convergence of the update matrices. In order
to achieve this, the division by N
2
is performed in (d).
Further methods for the solution of equations can be found in [60].
4.3.3 Secant Method for Minimization
When looking for a solution of a non-restricted optimization problem, setting the gra-
dient to zero leads to a non-linear equation. In contrast to the preceding section we
have in this situation the opportunity to take the function to be minimized as the de-
scent function for the step size rule. The direction of the secant is always a direction
of descent, provided that the update matrix is positive denite. Those update formu-
las will be of particular signicance in this context, which generate positive denite
matrices. In theory as well as in practice the BFGS method will play an eminent role.
BFGS method
(a) Choose a symmetrical and positive denite n n-matrix H
0
, x
0
R
n
, and a
step size rule (G), (ARE) or (PW). Put k = 0 and choose a stop criterion (e.g.
test of the norm of the gradient).
(b) If stop criterion is satised, then Stop.
(c) Put d
k
:= H
k
f(x
k
) and determine a step size
k
with f as descent function.
Put x
k+1
:= x
k
k
d
k
. Put s
k
:= x
k+1
x
k
and y
k
:= f(x
k+1
) f(x
k
).
Compute H
k+1
according to the following formula:
H
k+1
:= H
k
+
v
k
s
T
k
+s
k
v
T
k
y
k
, s
k

v
k
, y
k
s
k
s
T
k
y
k
, s
k
2
with v
k
:= s
k
H
k
y
k
.
(d) Put k k + 1 and goto (b).
n
R be twice continuously differentiable and x
0
R
n
.
Let the level set S
f
(x
0
) be bounded and f
tt
positive denite on S
f
(x
0
). Then the
BFGS-method can be performed with every symmetrical positive denite start matrix
and is at least Q-superlinearly convergent to the minimal solution. The matrices of
the sequence (H
k
)
kN
0
are uniformly positive denite, i.e. there are positive numbers
m, M, such that for all z R
n
with |z| = 1
m H
k
z, z M
holds. Furthermore, the damped method merges into the undamped method.
Remark. If one wants to apply the BFGS-method to non-convex functions, it turns
out that the PowellWolfe rule assumes a particular signicance, because it generates
also in this situation positive denite update matrices.
Testing the Angle
The BFGS-method preserves positive deniteness of the update matrices and guar-
antees that the direction of the secant is a direction of descent. Even then the angle
between the current gradient and the search direction d
k
can become unfavorable at
times. As an additional measure, one can test in the context of secant methods the
cosine
k
:=
f(x
k
),d
k
)
|f(x
k
)||d
k
|
of the angle in question. If
k
falls below a given thres-
hold, a new direction of descent is constructed, e.g. a convex combination of gradient-
and direction of the secant. The BFGS-method does have a self correcting property
w.r.t. the sequence (
k
)
kN
0
, but also here an improvement of the convergence can be
achieved by such a test for the angle. An algorithmic suggestion for the whole class
of secant methods could be the following:
Secant Method with Angle Check
(a) Choose a symmetrical and positive denite n n-Matrix H
0
, x
0
R
n
, a pre-
cision > 0 and a step size rule (G), (AR), (ARE) or (PW). Put k = 0 and
choose a stop criterion (e.g. test of the norm of the gradient).
(b) If stop criterion is satised, then Stop.
(c) Compute p
k
:= H
k
f(x
k
) and
k
:=
f(x
k
),p
k
)
|f(x
k
)||p
k
|
. If
k
< , choose a d
k
,
such that
f(x
k
),d
k
)
|f(x
k
)||d
k
|
, otherwise put d
k
:= p
k
.
(d) Determine a step size
k
with f as descent function. Put x
k+1
:= x
k

k
d
k
.
Put s
k
:= x
k+1
x
k
and y
k
:= f(x
k+1
) f(x
k
). Compute H
k+1
using the
update formula of a secant method for the inverse matrix.
(e) Put k k + 1 and goto (b).
For the theoretical background see [60] the theorem in Section 11, p. 185.
4.4 A Matrix-free Newton Method
See [64]: Let U be an open subset of R
n
and F : U R
n
differentiable. In order to
determine x
k+1
the Newton equation
F
t
(x
k
)(x x
k
) = F(x
k
)
has to be solved. Putting A
k
:= F
t
(x
k
), b
k
:= F(x
k
), and y := x x
k
the above
equation amounts to A
k
y = b
k
. Its solution y
k
yields x
k+1
= x
k
+ y
k
. For large n
storing the Jacobian F
t
(x
k
) becomes prohibitive. A matrix-free variant of the Newton
method is obtained by replacing F
t
(x
k
)(x x
k
) by the directional derivative
F
t
(x
k
, x x
k
) = lim
t0
F(x
k
+t(x x
k
)) F(x
k
)
t
.
For the solution of A
k
y = b
k
a method is employed that does not require the know-
ledge of A
k
, but only the result of the multiplication A
k
u for a uR
n
, where u denotes a
vector appearing in an intermediate step of the corresponding so-called MVM-method
(Matrix-Vector-Multiply-method). Here A
k
u is replaced by the directional derivative
F(x
k
, u). If not available analytically, it is replaced by the approximation
F(x
k
+u) F(x
k
)
for small > 0.

A suitable class of (iterative) solution methods for linear equations is represented
by the Krylov space methods. Also the RA-method in [60] can be used. For the
important special cases of symmetrical and positive denite matrices the conjugate
gradient and conjugate residual methods, and for non-symmetrical matrices A
k
the
CGS-method can be employed.
Chapter 5
Stability and Two-stage Optimization Problems
In our discussion of Polya algorithms in Chapter 2 the question referring to the closed-
ness of the algorithm, i.e. if every point of accumulation of the sequence of solutions
of the approximating problems is a solution of the limit function, is of central impor-
tance.
In this chapter we will treat this question in a more general framework. It turns
out that lower semi-continuous convergence is responsible for the closedness of the
algorithm. Monotone sequences of functions, which converge pointwise, will turn
out to be lower semi-continuously convergent and pointwise convergent sequences
of convex functions even continuously convergent. The latter statement is based on
an extension of the principle of uniform boundedness of Banach to families of con-
vex functions. For the verication of pointwise convergence for sequences of convex
functions, a generalized version of the theorem of BanachSteinhaus for convex func-
tionals turns out to be a useful tool: in the case of Orlicz spaces e.g. it sufces to
consider the (dense) subspace of step functions, where pointwise convergence of the
functionals is essentially reduced to that of the corresponding Young functions.
We will investigate the actual convergence of the approximate solutions beyond
the closedness of the algorithm in the section on two-stage optimization. In this
context we will also discuss the question of robustness of the algorithms against -
solutions.
In the last section of this chapter we will see that a number of results about se-
quences of convex functions carry over to sequences of monotone operators.
Much of the material presented in this chapter is contained in [59].
5.1 Lower Semi-continuous Convergence and Stability
Denition 5.1.1. Let X be a metric space. A sequence (f
n
: X R)
nN
is called
lower semi-continuously convergent resp. upper semi-continuously convergent to f :
X R, if (f
n
)
nN
converges pointwise to f and for every convergent sequence
(x
n
)
nN
in X with limx
n
= x
lim
n
f
n
(x
n
) f(x)
resp.
lim
n
f
n
(x
n
) f(x)
holds.
130 Chapter 5 Stability and Two-stage Optimization Problems
Remark. For the practical verication of lower semi-continuous convergence the fol-
lowing criterion turns out to be useful: f is lower semi-continuous and there exists a
sequence (
n
) tending to zero, such that f
n
+
n
f.
An immediate consequence of the denition is the
Theorem 5.1.2. Let X be a metric space and let the function sequence (f
n
: X
R)
nN
converge lower semi-continuously to f : X R, then every point of accumu-
lation of the minimal solutions M(f
n
, X) is an element of M(f, X).
Proof. Let x be such a point of accumulation, i.e. x = lim
i
x
n
i
with x
n
i

M(f
n
i
, X). For an arbitrary x X we have
f
n
i
(x
n
i
) f
n
i
(x).
The lower semi-continuous convergence then implies
f( x) lim
i
f
n
i
(x
n
i
) lim
i
f
n
i
(x) = f(x).
The principle of lower semi-continuous convergence provides a framework for fur-
ther investigations: in particular the question arises, how we can establish that lower
semi-continuous convergence holds.
A simple criterion is provided by monotone convergence (see below).
In the case of pointwise convergence of real valued, convex functions even contin-
uous convergence is obtained (see Theorem 5.3.6 and Theorem 5.3.8).
In the context of convex functions one obtains stability results even if the sets, on
which the minimization is performed, are varied. The Kuratowski convergence, also
referred to as topological convergence (see Denition 5.1.5), is the appropriate notion
of set convergence in this situation.
A stability theorem for sequences of convex functions with values in R is obtained,
if their epigraphs converge in the sense of Kuratowski (see Theorem 5.1.7).
5.1.1 Lower Semi-equicontinuity and Lower Semi-continuous
Convergence
Denition 5.1.3. Let X be a metric space. Asequence of functions (f
n
: X R)
nN
is called lower semi-equicontinuous at x X, if for all > 0 there is a neighborhood
U(x) of x, such that for all n N and all y U(x) we have
f
n
(x) f
n
(y).
n
: X R)
nN
be a sequence of functions, which converges
pointwise to f on X and let f
n
nN
be lower semi-equicontinuous on X, then for
all x X we obtain: f is lower semi-continuous at x and the convergence is lower
semi-continuous.
Section 5.1 Lower Semi-continuous Convergence and Stability 131
Proof. Let x X, then there is a neighborhood U(x), such that for all n N and all
y U(x) we have: f
n
(x) f
n
(y). From the pointwise convergence it follows
that f(x) f(y), i.e. f is lower semi-continuous at x.
If x
n
x, then x
n
U(x) for n N N, i.e. for n N: f
n
(x) f
n
(x
n
).
(a) Let x Domf, then there is N
1
N, such that f(x) f
n
(x) for all
n N
1
, hence
f(x) 2 f
n
(x) f
n
(x
n
)
for all n N
1
. Therefore limf
n
(x
n
) f(x).
(b) x / Domf, i.e. f(x) = and f
n
(x) f(x). Due to f
n
(x) f
n
(x
n
) it
follows that f
n
(x
n
) .
5.1.2 Lower Semi-continuous Convergence and Convergence of
Epigraphs
Denition 5.1.5. Let X be a metric space and (M
n
)
nN
a sequence of subsets of X.
Then we introduce the following notation
lim
n
M
n
:=
_
x X[ there is a subsequence (M
k
)
kN
of (M
n
)
nN
and x
k
M
k
,
such that x = lim
k
x
k
_
lim
n
M
n
:=
_
x X[ there is n
0
N and x
n
M
n
for n n
0
,
such that x = lim
n
x
n
_
.
The sequence (M
n
)
nN
is called Kuratowski convergent to the subset M of X, if
lim
n
M
n
= lim
n
M
n
= M,
notation: M = lim
n
M
n
.
Theorem 5.1.6. Let the sequence (f
n
)
nN
converge to f lower semi-continuously,
then Epi f
n
Epi f converges in the sense of Kuratowski.
Proof. Let (x
n
k
, r
n
k
) Epi f
n
k
be a convergent subsequence with (x
n
k
, r
n
k
)
(x, r), then r = limr
n
k
limf
n
k
(x
n
k
) f(x), i.e. (x, r) Epi f, hence
limEpi f
n
Epi f.
Let now x Domf and (x, r) Epi f, i.e. f(x) r < . Let further > 0 be
given, then f
n
(x) f(x) + r + for n sufciently large. Then (x, f
n
(x) +r
f(x)) Epi f
n
and we have (x, f
n
(x) +r f(x)) (x, f(x) +r f(x)) = (x, r).
If x / Domf, (x, r) / Epi f for all r R.
A weak version of the converse of the previous theorem we obtain by
Theorem 5.1.7. Let the sequence (f
n
)
nN
converge to f in the pointwise sense and
let lim Epi f
n
Epi f. Then (f
n
)
nN
converges to f lower semi-continuously.
Proof. Let (x
n
) converge to x.
(a) x Domf: let (f
n
k
(x
n
k
)) r be a convergent subsequence, then
(x
n
k
, f
n
k
(x
n
k
)) (x, r) Epi f,
and hence r f(x). In this way we obtain: limf
n
(x
n
) f(x).
(b) x / Domf: let (x
n
k
, f
n
k
(x
n
k
)) (x, r), then r = , because if r < , then
(x, r) Epi f, contradicting x / Domf. Therefore f
n
(x
n
) .
In the context of sequences of convex functions we will apply the convergence of
the corresponding epigraphs for further stability results.
5.2 Stability for Monotone Convergence
A particularly simple access to lower semi-continuous convergence and hence to sta-
bility results is obtained in the case of monotone convergence, because in this case
lower semi-continuous convergence follows from pointwise convergence.
Theorem 5.2.1. Let X be a metric space and (f
n
: X R)
nN
a monotone se-
quence of lower semi-continuous functions, which converges pointwise to a lower
semi-continuous function f : X R. Then the convergence is lower semi-
continuous.
Proof. Let x
n
x
0
and (f
n
) monotonically decreasing, then f
n
(x
n
) f(x
n
) and
therefore
limf
n
(x
n
) limf(x
n
) f(x
0
).
Let now (f
n
) monotonically increasing and k N. Then for n k
f
n
(x
n
) f
k
(x
n
),
and hence, due to the lower semi-continuity of f
k
lim
n
f
n
(x
n
) lim
n
f
k
(x
n
) f
k
(x
0
).
Therefore for all k N
lim
n
f
n
(x
n
) f
k
(x
0
).
From the pointwise convergence of the sequence (f
k
)
kN
to f the assertion follows.
Section 5.2 Stability for Monotone Convergence 133
Remark. If the convergence of (f
n
) is monotonically decreasing, apparently just the
lower semi-continuity of the limit function is needed. If however the convergence is
monotonically increasing, then the lower semi-continuity of the limit function f fol-
lows fromthe lower semi-continuity of the elements of the sequence. Acorresponding
theorem and proof is available for upper semi-continuous functions and upper semi-
continuous convergence.
The previous theorem and the preceding remark lead to the following
Theorem 5.2.2 (Theorem of Dini). Let X be a metric space and (f
n
: X R)
nN
a monotone sequence of continuous functions, which converges pointwise to a contin-
uous function f : X R. Then the convergence is continuous.
Proof. Upper semi-continuous and lower semi-continuous convergence together im-
ply continuous convergence according to Denition 5.3.1.
As a consequence of Theorem 5.1.2 and Theorem 5.2.1 we obtain
Theorem 5.2.3 (Stability Theorem of Monotone Convergence). Let X be a metric
space and (f
n
: X R)
nN
a monotone sequence of lower semi-continuous func-
tions, which converges pointwise to a lower semi-continuous function f : X R.
Then every point of accumulation of the minimal solutions M(f
n
, X) is an element of
M(f, X).
If the function sequence is monotonically decreasing, the values of the optimization
problems (f
n
, X) converge to the value of the optimization problem (f, X), i.e.
inf f
n
(X)
n
inf f(X). (5.1)
If in addition X is a compact set, then lim
n
M(f
n
, X) ,= and the convergence
of the values is guaranteed for monotonically increasing function sequences.
Proof. Only (5.1) remains to be shown. Let (f
n
) be monotonically decreasing. The
sequence r
n
:= inf f
n
(X) is also decreasing and hence convergent to a r
0
R.
If r
0
= or r
0
= (5.1) holds. Suppose r
0
R and r
0
> inf f(X). Then
there is x
0
X such that f(x
0
) < r
0
. From pointwise convergence we obtain
f
n
(x
0
) f(x
0
) and thus the contradiction r
0
= lim
n
inf f(X) f(x
0
) < r
0
.
Let now (f
n
) be increasing and X compact. Let x
n
M(f
n
, X) and (x
n
i
) a
subsequence of (x
n
) converging to x. By the rst part of the theorem x M(f, X).
Pointwise and lower semi-continuous convergence then yield
f( x) = limf
n
i
( x) limf
n
i
(x
n
i
) f( x) = inf f(X).
As an application of the above stability theorem we consider the directional deriva-
tive of the maximum norm:
Theorem 5.2.4. Let T be a compact metric space and C(T) the space of the con-
tinuous functions on T with values in R. For the function f : C(T) R with
f(x) := max
tT
[x(t)[ and h C(T), x C(T) 0 we obtain
f
t
+
(x, h) = maxh(t) sign(x(t)) [ t T.
Proof. From the monotonicity of the difference quotient of convex functions (see
Theorem 3.3.1) it follows that the sequence (g
n
) with
g
n
(t) :=
[x(t)[ [x(t) +
n
h(t)[ +f(x) [x(t)[
n
for
n
0 is a monotonically increasing sequence, which converges pointwise for
[x(t)[ = f(x) to g(t) := h(t) sign(x(t)), for [x(t)[ < f(x) however to . Since
T is compact, the sequence of the values min
tT
g
n
(t) converges due to the above
theorem to min
tT
g(t), thus
lim
0
max
tT
g
(t) = lim
0
max
tT
[x(t) +h(t)[ [x(t)[ +[x(t)[ f(x)
= max
_
lim
0
[x(t) +h(t)[ [x(t)[
t T and [x(t)[ = f(x)

_
= maxh(t) sign(x(t)) [ t T and [x(t)[ = f(x).
5.3 Continuous Convergence and Stability for Convex
Functions
It will turn out that the following notion is of central signicance in this context:
Denition 5.3.1. Let X and Y be metric spaces. A sequence of real functions
(f
n
)
nN
with f
n
: X Y is called continuously convergent to f : X Y , if
for every sequence (x
n
)
nN
converging to a x
0
f
n
(x
n
) f(x
0
)
holds.
The signicance of continuous convergence for stability assertions is immediately
apparent from the following simple principle:
Theorem 5.3.2. Let X be a metric space and (f
n
: X R)
nN
a sequence of
functions, which converges continuously to f : X R. If for each n N the point
x
n
n
on X, then every point of accumulation of the sequence
(x
n
)
nN
is a minimal solution of f on X.
Section 5.3 Continuous Convergence and Stability for Convex Functions 135
Proof. Since x
n
n
, we have for all x X
f
n
(x
n
) f
n
(x). (5.2)
Let x
0
be a point of accumulation of (x
n
)
n
N, whose subsequence (x
k
)
kN
con-
verges to x
0
. Then the continuous convergence implies
f
k
(x
k
) f(x
0
) and f
n
(x) f(x).
Going to the limit in Inequality (5.2) preserves its validity and we obtain
f(x
0
) f(x).
We will see that for certain classes of functions pointwise convergence already
implies continuous convergence. Among themare the convex functions. This will also
hold for sums, products, differences and compositions of convex functions, because
continuous convergence is inherited according to the following general principle.
Theorem 5.3.3. Let X, Y, Z be metric spaces and (F
n
)
nN
with F
n
: X Y and
(G
n
)
nN
with G
n
: Y Z sequences of continuously convergent functions. Then
the sequence of compositions (F
n
G
n
)
nN
is continuously convergent.
Proof. Let (F
n
) to F and (G
n
) to G continuously convergent. Let (x
n
)
n
be a se-
quence in X convergent to x, then we have: F
n
(x
n
) F(x) in Y and hence
G
n
(F
n
(x
n
)) G(F(x)) in Z.
In order to be responsive to the above announcement about products and differ-
ences, we will mention the following special case:
Theorem5.3.4. Let Y =R
m
, Z=R, G: R
m
Rbe continuous and (F
n
)
nN
=(f
(1)
n
,
. . . , f
(m)
n
)
nN
continuously convergent. Then the sequence (G(f
(1)
n
, . . . , f
(m)
n
))
nN
is
continuously convergent.
At rst we will explicate that the notion of continuous convergence is closely related
to the notion of equicontinuity.
Continuous Convergence and Equicontinuity
Denition 5.3.5. Let X, Y be metric spaces and F a family of functions f : X Y .
(a) Let x
0
X. F is called equicontinuous at x
0
, if for every neighborhood V of
f(x
0
) there is a neighborhood U of x
0
, such that for all f F and all x U
we have
f(x) V.
(b) F is called equicontinuous, if F is equicontinuous at each x
0
X.
Theorem 5.3.6. Let X, Y be metric spaces and (f
n
: X Y )
nN
a sequence of
continuous functions, which converges pointwise to the function f : X Y . Then
the following statements are equivalent:
(a) f
n
nN
is equicontinuous,
(b) f is continuous and f
n
nN
converges continuously to f,
(c) f
n
nN
converges uniformly on compact subsets to f.
Proof. (a) (b): Let x
0
X and > 0. Then there is a > 0, such that for all
x K(x
0
, ) and all n N we have: d(f
n
(x), f
n
(x
0
)) .
The pointwise convergence implies for all x K(x
0
, ): d(f(x), f(x
0
)) and
thus the continuity of f.
Let x
n
x
0
. For n n
0
we have x
n
K(x
0
, ) and d(f
n
(x
0
), f(x
0
)) .
Therefore
d(f
n
(x
n
), f(x
0
)) d(f
n
(x
n
), f
n
(x
0
)) +d(f
n
(x
0
), f(x
0
)) 2.
(b) (c): Suppose there is a compact subset K of X, on which the sequence (f
n
)
does not converge uniformly to f. Then we have
> 0 n N k
n
N, x
k
n
K : d(f
k
n
(x
k
n
), f(x
k
n
)) .
The sequence (k
n
) has a strictly monotonically increasing subsequence (i
n
), such that
(x
i
n
) converges to x K. This leads to the contradiction
d(f
i
n
(x
i
n
), f(x
i
n
)) d(f
i
n
(x
i
n
), f( x)) +d(f(x
i
n
), f( x)) 0.
(c) (a): Suppose the family f
n
is not equicontinuous at a point x
0
X, then
> 0 n N k
n
N, x
k
n
K
_
x
0
,
1
n
_
: d(f
k
n
(x
k
n
), f
k
n
(x
0
)) .
Since x
k
n
x
0
and nitely many continuous functions are equicontinuous, the set
J = k
n
contains innitely many elements. Hence there exists in J a strictly mono-
tonically increasing sequence (i
n
). By assumption the sequence (f
n
) converges uni-
formly on the compact set x
k
n
nN
x
0
to f. Hence there is a n N, such that
for all n n:
d(f
i
n
(x
i
n
), f(x
i
n
)) <

4
and d(f
i
n
(x
0
), f(x
0
)) <

4
.
The function f is the uniform limit of continuous functions and hence continuous.
Therefore
d(f(x
i
n
), f(x
0
)) <

4
,
provided that n was chosen large enough. From the triangle inequality it follows that
for n n
d(f
i
n
(x
i
n
), f
i
n
(x
0
))
d(f
i
n
(x
i
n
), f(x
i
n
)) +d(f(x
i
n
), f(x
0
) +d(f
i
n
(x
0
), f(x
0
))
<

4
+

4
+

4
,
a contradiction.
For stability assertions of convex optimization we will show the following non-
linear extension of the uniform boundedness principle of Banach about continuous
linear operators to convex functions:
A pointwise bounded family of continuous convex functions is equicontinuous.
Denition 5.3.7. Let Y be a metric space. A family F of real functions on Y is called
lower semi-equicontinuous (resp. upper semi-equicontinuous) at the point y
0
, if for
every > 0 there is a neighborhood V of y
0
, such that for all y V and all f F
f(y) f(y
0
) (resp. f(y) f(y
0
) )
holds.
We remark that the family F is equicontinuous, if F is lower semi-equicontinuous
and upper semi-equicontinuous.
Theorem 5.3.8. Let X be a normed space, U an open and convex subset of X and
let F be a family of convex functions on U, which is pointwise bounded. Then the
following statements are equivalent:
(a) F is equicontinuous on U
(b) F is upper semi-equicontinuous on U
(c) F is uniformly bounded from above on an open subset U
0
of U
(d) F is equicontinuous at a point in U.
Proof. (a) (b) and (b) (c) follow from the denitions.
(c) (d): Let x
0
U
0
and a > 0, such that K(x
0
, a) U
0
. Let 0 < < 1 be
given. If y K(x
0
, a), then there exists a x K(x
0
, a) with y = (1 )x
0
+ x.
The convexity implies for all f F
f(y) f(x) + (1 )f(x
0
) = f(x
0
) +(f(x) f(x
0
).
By assumption there is a M > 0 such that for all x U
0
and all f F: f(x) M.
If we put, using the pointwise boundedness, := M inff(x
0
) [ f F, then we
obtain for all y K(x
0
, a)
f(y) f(x
0
) .
On the other hand for each z K(x
0
, a) there is a x K(x
0
, a) with
z = x
0
(x x
0
) resp. x
0
=
1
1 +
z +

1 +
x.
The convexity implies for all f F
f(x
0
)
1
1 +
f(z) +

1 +
f(x),
hence via multiplication of both sides by 1 +:
f(z) f(x
0
) (f(x
0
) f(x)) (inff(x
0
) [ f F M) = .
Thus we obtain the equicontinuity of F at x
0
.
(d) (a): Due to the above considerations it sufces to show that each point
x U has a neighborhood, on which F is uniformly bounded from above. Let F be
equicontinuous at y U and let a > 0, such that for all z K(y, a) =: V and all
f F
f(z) f(y) + 1 supf(y) [ f F + 1 =: r
holds. (Here we have, of course, reused the pointwise boundedness). Let now x U
and 0 < < 1 be chosen, such that (1 +)x U and V
:= x +

1+
V U holds.
For each s V
there is apparently a z V with s = x +

1+
z and it follows that
f(s) = f
_
x +

1 +
z
_
1
1 +
f((1 +)x) +

1 +
f(z)
1
1 +
supf((1 +)x) [ f F +

1 +
r =: r
t
< .
For a family consisting of a single function, we immediately obtain
Corollary 5.3.9. Let X be a normed space, U an open and convex subset of X and
let f : U R be a convex function. Then the following statements are equivalent:
(a) f is continuous on U
(b) f is upper semi-continuous on U
(c) f is bounded from above on an open subset U
0
of U
(d) f is continuous at a point in U.
Corollary 5.3.10. Let X be a Banach space, U an open and convex subset of X. Let
(f
k
: U R)
kN
be a sequence of continuous convex functions, which converges
pointwise to the function f : U R. Then f is a continuous convex function.
Proof. From f

f := sup
kN
f
k
continuity of f follows using Theorems 5.3.13
and 5.3.9.
As a consequence fromCorollary 5.3.9 we obtain the following important and beau-
tiful assertion that every convex function on R
n
is continuous.
Theorem 5.3.11. Let U be an open convex subset of R
n
. Then every convex function
f : U R is continuous.
Proof. Let w.l.o.g. 0 U and 0 < < 1, such that the
1
-ball
V :=
_
x R
n
i=1
[x
i
[ <
_
is contained in U. For x V we have
x =
n
i=1
x
i
e
i
=
n
i=1
[x
i
[
sign(x
i
)e
i
+
_
1
n
i=1
[x
i
[
_
0.
Then for all x V we obtain the following inequality:
f(x)
n
i=1
[x
i
[
f(sign(x
i
)e
i
) +
_
1
n
i=1
[x
i
[
_
f(0)
max([f(e
i
)[
n
1
, [f(e
i
)[
n
1
, f(0)).
Equicontinuity of Convex Functions in Banach Space
In Banach spaces we have the
Theorem 5.3.12. Let U be an open and convex subset of a Banach space X and
f : U R a convex function. Then the following statements are equivalent:
(a) f is continuous
(b) f is lower semi-continuous
(c) f is upper semi-continuous.
Proof. (b) (a): Suppose f is not continuous. Then f is by Theorem 5.3.8 un-
bounded from above on any open subset of U. Then for every k N the set B
k
:=
x U [ f(x) > k is non-empty and also open, because f is lower semi-continuous.
Now we determine iteratively a sequence of closed non-empty balls: in B
1
we
choose a ball U
1
with radius 1. If the k-th ball U
k
with radius
1
k
has been
determined, we choose a non-empty closed ball U
k+1
with radius
1
k+1
in B
k+1

Int(U
k
).
The sequence of centers (x
k
) is a Cauchy sequence, since for all k, p N we have:
|x
k+p
x
k
|
1
k
. Since X is a Banach space, the sequence (x
k
) converges to a
x
X. For p we then obtain

|x
x
k
|
1
k
,
i.e. for all k N we have x
U
k
. Since U
k
B
k
it follows that x
U and
f(x
) = in contradiction to f being nite on U.

We now approach the central assertion for our stability considerations: the follow-
ing theorem yields an extension of the theorem of Banach on uniform boundedness to
families of convex functions.
Theorem 5.3.13 (Uniform boundedness for convex functions). Let X be a Banach
space, U an open convex subset of X and F a family of continuous convex functions
f : U R, which is pointwise bounded. Then F is equicontinuous. Moreover, the
functions x supf(x) [ f F and x inff(x) [ f F are continuous.
Proof. The function

f := sup
fF
f being the supremum of continuous convex func-
tions is lower semi-continuous and convex and according to Theorem 5.3.12 continu-
ous. In particular

f is bounded from above on a neighborhood. Due to Theorem 5.3.8
the family F is then equicontinuous.
Let x
0
U. Then for every > 0 there is a neighborhood V of x
0
, such that for
all f F and all x V we have
f(x) f(x
0
) inf
fF
f(x
0
) ,
hence
inf
fF
f(x) inf
fF
f(x
0
) ,
i.e. the function inf
fF
f is lower semi-continuous and being the inmum of contin-
uous functions also upper semi-continuous.
In particular every point in U has a neighborhood, on which F is uniformly
bounded.
Remark 5.3.14. We obtain the theorem of Banach on uniform boundedness, if one
assigns to a linear operator A : X Y (X Banach space, Y normed space) the
function f
A
: X R with f
A
(x) := |Ax|.
For the discussion of strong solvability in Chapter 8 we need the following
Theorem 5.3.15. Let X be a Banach space and M a weakly bounded subset, then M
is bounded.
Proof. Let x
, then there is a number

x
, such that [x, x
[
x
holds for
all x M. If we consider M as a subset of X
, then M is pointwise bounded on

X
. According to the theorem on uniform boundedness of Banach M is bounded in

X
. It is well known that the canonical mapping : X X
with x , x is an
isometry, since
|(x)| = supx
, x [ |x
| 1 |x|.
On the other hand there is according to the theorem of HahnBanach for x X a
x
with |x
| = 1 and x
, x = |x|. Thus M is also bounded in X.

Remark 5.3.16. For convex functions the above Theorem 5.3.13 permits to reduce
the assertion of continuous convergence to pointwise convergence. In function spaces
this assertion can often be performed successfully using the theorem of Banach
Steinhaus, where essentially the pointwise convergence on a dense subset is required.
Due to the equicontinuity shown above this carries over to sequences of continuous
convex functions.
Theorem 5.3.17 (BanachSteinhaus for convex functions). Let U be an open and
convex subset of a Banach space. A sequence of continuous convex functions (f
n
:
U R)
nN
converges pointwise to a continuous convex function f, if and only if the
following two conditions are satised:
(a) the sequence (f
n
)
nN
is pointwise bounded
(b) the sequence (f
n
)
nN
converges pointwise on a dense subset D of U to f.
Proof. The necessity is obvious. Let now be x U. By Theorem 5.3.13 the family
f
n
n=1
is equicontinuous, i.e. there exists a neighborhood V of x, such that for all
n N and y V
[f
n
(x) f
n
(y)[ .
Since D is dense in U, we have D V ,= . Let x
t
D V , then there is a N N
with [f
n
(x
t
) f
m
(x
t
)[ for all n, m N, hence for n, m N
[f
n
(x) f
m
(x)[ [f
n
(x) f
n
(x
t
)[ +[f
n
(x
t
) f
m
(x
t
)[ +[f
m
(x
t
) f
m
(x)[ 3.
Therefore the limit lim
n
f
n
(x) =: f(x) exists. But f is due to Corollary 5.3.10
convex and continuous.
5.3.1 Stability Theorems
Pointwise convergence leads according to Theorem 5.3.13 to equicontinuity and
hence by Theorem 5.3.6 to continuous convergence, which in turn implies the follow-
ing stability assertion:
Theorem 5.3.18. Let X be a Banach space, K a closed convex subset of X and (f
n
:
X R)
nN
a sequence of continuous convex functions, which converges pointwise
to a function f : X R. Let x
n
n
on K. Then every point
of accumulation of the sequence (x
n
)
nN
is a minimal solution of f on K.
The essential message of the above stability Theorem 5.3.18 remains valid, if one
admits that in each step the optimization is done on different domains, provided that
these converge to the domain of the limit problem in the sense of Kuratowski (see
Denition 5.1.5). This notion for the convergence of sets displays an analogy to the
pointwise convergence of functions.
The continuous convergence of functions leads together with the Kuratowski con-
vergence of the restriction sets to the following general stability principle:
Theorem 5.3.19. Let X be a metric space, (S
n
)
nN
a sequence of subsets of X,
which converges to S X in the sense of Kuratowski. Let (f
n
: X R)
nN
be
continuously convergent to f : X R. Then it follows that
limM(f
n
, S
n
) M(f, S).
Proof. Let x
n
i
M(f
n
i
, S
n
i
) and x
n
i
x. Let y S. Then there exists a (y
n

S
n
)
nN
with y = limy
n
. Since f
n
f continuously, we have
f(x) = lim
i
f
n
i
(x
n
i
) lim
i
f
n
i
(y
n
i
) = f(y).
The following variant of the stability theorem admits a weaker version of topologi-
cal convergence:
n
: X R)
nN
a sequence of functions, which converges
continuously to f : X R. Let S X and (S
n
)
nN
a sequence of subsets of X
with
lim
n
S
n
S. (5.3)
Let for each n N x
n
n
on S
n
.
Then every point of accumulation of the sequence (x
n
)
nN
lying in S is a minimal
solution of f on S.
Proof. Let x
n
i
M(f
n
i
, S
n
i
) and x
n
i
x
S.
By (5.3) for each x S there is a n
0
N, such that for all n n
0
x
n
S
n
and x
n
x.
By denition of a minimal solution we have
f
n
i
(x
n
i
) f
n
i
(x
n
i
).
The continuous convergence of (f
n
)
nN
yields
f(x
) = lim
i
f
n
i
(x
n
i
) lim
i
f
n
i
(x
n
i
) = f(x).
Theorem 5.3.21 (Stability Theorem of Convex Optimization). Let X be a Banach
space, U an open and convex subset of X and (f
n
: U R)
nN
a sequence of
convex continuous functions, which converges pointwise to a function f : U R.
Furthermore let (M
n
)
nN
be a sequence of subsets of U with U M = lim
n
M
n
.
Let x
n
n
on M
n
. Then every point of accumulation of the
sequence (x
n
)
nN
is a minimal solution of f on M.
Corollary 5.3.22. Let in addition
(a) M
n
for all n N be closed
(b) exist an in X compact subset K of U such that M
n
K for all n N.
Then the additional statements hold:
(a) the sets lim
n
M(f
n
, M
n
) and M(f
n
, M
n
) are non-empty
(b) from x
n
M(f
n
, M
n
) it follows that f(x
n
) inf f(M)
(c) inf f
n
(M
n
) inf f(M).
Corollary 5.3.23. If in the above stability theorem we require instead of M =
lim
n
M
n
only M lim
n
M
n
, then still
M limM(f
n
, M
n
) M(f, M). (5.4)
Proof of Corollaries 5.3.22 and 5.3.23. Let x = lim
i
x
n
i
with x
n
i
M(f
n
i
, M
n
i
)
and x M. Let y M be arbitrarily chosen and y = lim
n
y
n
with y
n
M
n
. The
sequence f
n
converges by Theorem 5.3.6 and Theorem 5.3.8 continuously to f and
hence
f(x) = lim
i
f
n
i
(x
n
i
) lim
i
f
n
i
(y
n
i
) = f(y).
The compactness of the sets M
n
and K yields (a).
Let x
n
M(f
n
, M
n
) and let (x
k
) be a convergent subsequence converging to
x
0
M, then the continuous convergence implies lim
k
f
k
(x
k
) = f(x
0
) = inf f(M).
Thus every subsequence of (f
n
(x
n
)) has a subsequence converging to inf f(M) and
hence (c). Using the continuity of f we obtain (b) in a similar manner.
The requirement M U cannot be omitted, as the following example shows:
Example 5.3.24. Let U := (0, 1), let f
n
:= [ (1
2
n
[ and f := [ 1[. Let further
M
n
:= [
1
n
, 1
1
n
], then limM
n
= [0, 1] = M. The sequence of minimal solutions
x
n
= 1
2
n
of f
n
on M
n
converges to 1, outside of the domain of denition of f.
Theorem 5.3.25 (Stability Theorem of Convex Optimization in R
n
). Let U be an
open and convex subset of a nite dimensional normed space X and let (f
k
: U
R)
kN
be a sequence of convex functions, which converges pointwise to a function
f : U R. Let further (S
k
)
kN
be a sequence of closed convex subsets in X with
S = lim
k
S
k
. For

S := U S let the set of minimal solutions of f on

S be non-empty
and bounded. Then for

S
k
:= U S
k
the following statements hold:
(a) for large k N the set of minimal solutions M(f
k
,

S
k
) is non-empty
(b) lim
k
M(f
k
,

S
k
) is non-empty and from a k
0
N on uniformly bounded, i.e.
kk
0
M(f
k
,

S
k
) is bounded
(c) lim
k
M(f
k
,

S
k
) M(f,

S)
(d) inf f
k
(

S
k
) inf f(

S)
(e) from x
k
M(f
k
,

S
k
) for k N it follows that f(x
k
) inf f(

S)
(f) if M(f,

S) consists of a single point x
0
, then x
k
M(f
k
,

S
k
) implies for
k N the convergence x
k
x
0
.
Proof. Let x M(f,

S) and r = f(x) = inf f(

S). Since M(f,

S) is compact, there
is a ball K(0, d) with d > 0 such that
A := M(f,

S) +K(0, d) U
holds. Furthermore there exists an > 0, such that for a n
0
N and all n n
0
H
n
:= u[ u

S
n
, f
n
(u) r + A
holds, because otherwise there is a strictly monotone sequence (k
n
) in N and u
k
n

S
k
n
with f
k
n
(u
k
n
) r +
1
k
n
but u
k
n
/ A. Since (S
n
) converges to S, there is a
sequence (y
n
S
n
) with y
n
x. For large k
n
there exists an intersection z
k
n
of the
interval [y
k
n
, u
k
n
] with the set
D :=
_
w A[ inf
vM(f,

S)
|w v| = d
_
.
For k
n
n
0
we have y
k
n
A U and hence y
k
n

S
k
n
. Due to the convexity
of

S
k
n
we obtain z
k
n

S
k
n
. Since D is non-empty and compact, the sequence
(z
k
n
) contains a subsequence (z
l
)
lL
convergent to z D. As S
n
is convex and
limS
n
= S, we have z S, on the other hand z A U, i.e. z

S. For each l L
there is
l
[0, 1] with z
l
=
l
u
l
+ (1
l
)y
l
, and we obtain
f
l
(z
l
) =
l
f
l
(u
l
) + (1
l
)f
l
(y
l
)
l
_
r +
1
l
_
+ (1
l
)f
l
(y
l
).
The continuous convergence implies f
l
(y
l
) f(x) and f
l
(z
l
) f( z), hence f( z)
r. Thus we have z M(f,

S) D, contradicting the denition of D.
For large n the sets H
n
are non-empty, because for every sequence (v
n

S
n
)
convergent to x we have: f
n
(v
n
) f(x) = r. Therefore minimization of f
n
on

S
n
can be restricted to H
n
.
H
n
being a subset of A is apparently bounded but also closed: for let (h
k
)
be a sequence in H
n
with h
k

h, then

h A U and

h S
n
, since S
n
is
closed, i.e.

h

S
n
. Due to the closedness of the level sets of f
n
it follows that
h H
n
. For K := A we obtain with Corollary 5.3.22 to Stability Theorem 5.3.21
(a) to (e). Assertion (f) follows from the fact that (x
n
) is bounded and every point of
accumulation of (x
n
) is equal to x
0
.
Remark. The limit S = lim
k
S
k
is a closed subset of X. If, instead of the above
limit, one would consider lim
k

S
k
, this limit is if it exists not necessarily a subset
of U. (Hence the careful wording in the above theorem.)
Example 5.3.26. Let U := (x, y) R
2
[ x > 0, y > 0, let f : U R be dened
by f(x, y) =
1
x
+
1
y
, let further f
k
:= f +
1
k
and S
k
:= (x, y) R
2
[ x
2
+ y
2
(1
1
k
)
2
. Then limS
k
= (x, y) R
2
[ x
2
+ y
2
1 = S. However lim

S
k
=

S
does not hold, since the limit set is necessarily closed.
A consequence of the above theorem is the following
Theorem 5.3.27. Let X be a nite dimensional normed space and let (f
k
: X
R)
kN
a sequence of lower semi-continuous convex functions, whose epigraphs con-
verge in the sense of Kuratowski to the epigraph of the function f
0
: X R. Let
further K be a closed convex subset of X and the set of minimal solutions of f
0
on K
be non-empty and bounded. Then we have
(a) for large k N the set of minimal solutions M(f
k
,

K) is non-empty
(b) lim
k
M(f
k
, K) is non-empty and from a k
0
N on uniformly bounded, i.e.
kk
0
M(f
k
, K) is bounded
(c) lim
k
M(f
k
, K) M(f
0
, K)
(d) inf f
k
(K) inf f(K)
(e) from x
k
M(f
k
, K) for k N it follows that f(x
k
) inf f
0
(K)
(f) if M(f
0
, K) consists of a single point x
0
, then x
k
M(f
k
, K) for k N
implies the convergence x
k
x
0
.
Proof. We dene
g : K R R
(x, t) g(x, t) = t.
We consider the sequence of minimization problems
infg(x, t) [ (x, t) Epi(f
k
).
Apparently we have for all k N
0
(x, inf(f
k
, K)) M(g, K R) x M(f
k
, K).
In particular the boundedness of the non-empty set M(f
0
, K) carries over to the
boundedness of the set M(g, KR). The sequence of the functions to be minimized
is in this context constant and equal to g and hence obviously pointwise convergent.
Since the f
k
are lower semi-continuous, the sets Epi(f
k
) are closed and thus the
conditions of the previous theorem are satised.
Kuratowski Convergence of Level Sets
The level sets of pointwise convergent sequences of convex functions turn out to be
Kuratowski convergent, if the level set of the limit function satises a Slater condition:
Theorem 5.3.28. Let U be an open subset of a Banach space and (h
n
)
nN
with h
n
:
U R be a sequence of continuous convex functions, which converge pointwise to
the convex function h : U R. Let there exist a x U with h( x) < 0, then the
sequence (S
n
)
nN
of the level sets S
n
:= x U [ h
n
(x) 0 converges in the sense
of Kuratowski to the level set S := x U [ h(x) 0.
Proof. Let x
n
k
S
n
k
with x
n
k
x, then the continuous convergence implies:
h
n
k
(x
n
k
) h(x) 0, hence x S. Therefore limS
n
S.
Conversely let now x S. If h(x) < 0, then due to the pointwise convergence we
have h
n
(x) < 0 for n > N, i.e. x S
n
for n > N. We obtain
Int(S) limS
n
.
To complete the proof we show that the boundary points of S are also in limS
n
: let
h( x) < 0 and h(x
0
) = 0. Consider the sequence (h
n
(x
0
)): if h
n
(x
0
) 0, put x
n
:=
x
0
, if h
n
(x
0
) > 0, then due to the intermediate value theorem there is a x
n
[ x, x
0
)
with h
n
(x
n
) = 0 for n > N, where N is chosen such that h
n
( x) < 0 for n > N. In
both cases x
n
S
n
for n > N. Suppose there is a subsequence x
n
k
, which converges
to a x [ x, x
0
), then due to continuous convergence h
n
k
(x
n
k
) h( x) < 0. But by
construction we then have x
n
k
= x
0
for k large enough, a contradiction. Therefore
x
n
x
0
and hence x
0
limS
n
.
We need the following lemma for the stability assertion of the subsequent theorem:
Lemma 5.3.29. Let U be an arbitrary set, T a compact metric space and g : TU
R with g(, z) continuous on T for all z U. Let further (T
n
)
nN
be a sequence of
compact subsets of T, converging in the sense of Kuratowski to T
0
. Let now h
n
:
U R be dened by h
n
(z) := max
tT
n
g(t, z) for all z U and correspondingly
h
0
: U R by h
0
(z) := max
tT
0
g(t, z) for all z U. Then we obtain
lim
n
h
n
(z) = h
0
(z) for all z U,
i.e. the sequence (h
n
)
nN
converges pointwise on U to h
0
.
Proof. Let
0
T
0
be chosen, such that g(
0
, z) = h
0
(z). The Kuratowski con-
vergence implies the existence of a sequence (t
n
)
nN
with t
n
T
n
for all n N,
which converges to
0
. From the continuity of g(, z) and t
n
T
n
T we obtain:
g(t
n
, z)
n
g(
0
, z) = h
0
(z) i.e.
limh
n
(z) h
0
(z).
Section 5.4 Convex Operators 147
On the other hand let
n
T
n
with g(
n
, z) = h
n
(z) and s := limg(
n
, z). Then
there is a subsequence (
n
k
)
kN
converging to a T, such that lim
k
g(
n
k
, z) =
s. The continuity of g(, z) implies lim
k
g(
n
k
, z) = g(, z). But the Kuratowski
convergence yields T
0
, i.e.
g(
n
k
, z) = h
n
k
(z)
k
s = g(, z) h
0
(z).
Theorem 5.3.30. Let U be an open subset of a Banach space, T a compact metric
space and g a real-valued mapping on T U. Let g(, z) be continuous on T for all
z U and g(t, ) convex for all t T. Let further (T
n
)
nN
be a sequence of compact
subsets of T, which converges in the sense of Kuratowski to T
0
, and let (S
n
)
nN
0
be a
sequence of subsets of U dened by
S
n
:= z U [ g(t, z) 0 for all t T
n
.
Let now (f
n
: U R)
nN
0
be a sequence of convex functions, which converges
pointwise to f
0
on all of U. Then the following stability statement holds:
lim
n
M(f
n
, S
n
) M(f
0
, S
0
).
The above theorem can be applied to semi-innite optimization (see corresponding
section in Chapter 2).
5.4 Convex Operators
Denition 5.4.1. Let X be a real vector space. A subset P of X is called a convex
cone in X, if P has the following properties:
(a) 0 P
(b) Rx P : 0 x P
(c) x
1
, x
2
P : x
1
+x
2
P.
A relation is called an order on X, if has the following properties:
(a) is reexive, i.e. x X : x x
(b) is transitive, i.e. x, y, z X : x y and y z x z
(c) is compatible with vector addition, i.e. x, y, z X : x y x+z y+z
(d) is compatible with scalar multiplication, i.e. Rx, y X : 0
and x y x y.
If P is a convex cone in X resp. an order on X, then we denote the pair (X, P)
resp. (X, ) as an ordered vector space.
Remark 5.4.2. Directly from the denition follows:
(a) If P is a convex cone in X, then the relation
P
, dened by
x, y X : x
P
y : y x P
is an order on X.
(b) If is an order on X, then the set
P := x X[ 0 x
is a convex cone.
Hence there is a one-to-one correspondence between an order on X and a convex cone
in X.
Example 5.4.3. Let (T, , ) be a measure space and L
() the corresponding Orlicz

space, then the cone
P := x L
() [ x(t) 0 -almost everywhere

is called the natural cone.
In problems, where not only the order but also topological properties play a role, the
notion of normal cones becomes important. The natural cones of the function spaces,
that are particulary relevant for applications, are normal but do not have an interior.
Denition 5.4.4. Let A be a subset of a vector space Y ordered by a convex cone C.
By the full hull [A]
C
of A we denote
[A]
C
:= z Y [ x
C
z
C
y for x A, y A.
Hence [A]
C
= (A+C) (AC). A is called full, if A = [A]
C
.
Aconvex cone C is called normal, if the full hull [B]
C
of the unit ball B is bounded.
A family F of convex cones is called uniformly normal, if the union
_
CF
[B]
C
is bounded.
A criterion for this is
Theorem 5.4.5. Let R := |z| [ C F, y B such that 0
C
z
C
y. If R is
bounded, then F is uniformly normal.
Proof. Let x

CF
[B]
C
. Then there are y
1
, y
2
B and a C F, such that
y
1

C
x
C
y
2
or 0
C
x y
1

C
y
2
y
1
. Let r be an upper bound of R. From
y
2
y
1
2B it follows that
|x y
1
|
2
r and hence |x| 2r + 1.
Examples for normal cones C in normed spaces are the natural cones in
(a) R
n
(b) C(T), where T is a compact metric space
(c) L
().
Proof. Ad (a): For the closed unit ball B w.r.t. the maximum norm in R
n
we even
have B = [B]
C
.
Ad (b) and (c): Let y B, then 0
C
z
C
y implies due to the monotonicity of
the norm |z| |y| 1. The previous theorem yields the assertion.
Convex Mappings
Let X and Y be vector spaces and C a cone in Y . The mapping A : X Y is called
C-convex, if for all 0 1 and all u, v X
A(u + (1 )v)
C
A(u) + (1 )A(v).
Example 5.4.6. Y = R
m
, C the natural cone in R
m
and for i = 1, . . . , m f
i
: X
R convex. Then A = (f
1
, . . . , f
m
) is a C-convex mapping from X to R
m
.
Uniform Boundedness
The following theorem is a generalization of the theorem of Banach on uniform
boundedness to families of convex operators.
Theorem 5.4.7. Let Q be a convex and open subset of a Banach space X and Y
a normed space. Let further C
i
iI
a family of uniformly normal cones in Y and
A
i
: Q Y a C
i
-convex continuous mapping for all i I. If the family A
i
iI
is pointwise norm-bounded, then A
i
iI
locally uniformly Lipschitz continuous, i.e.
for each x Q there is a neighborhood U of x and a number L > 0, such that for all
u, v U and all i I we have
|A
i
(u) A
i
(v)| L|u v|.
Proof. The family A
i
iI
is pointwise norm-bounded, i.e.
s(x) := sup
iI
|A
i
(x)| < for all x Q.
At rst we show: the function s : Q R is norm-bounded on an open ball Q
1
:
otherwise for each k N the set
D
k
:= x Q[ s(x) > k
would be dense in Q. Being the supremum of continuous functions s is lower semi-
continuous and hence D
k
open for all k N. Every Banach space is of second Baire
category (see [113], p. 27), and hence
k=1
D
k
,= .
But y
0

k=1
D
k
contradicts s(y
0
) < .
In the next step we show that every point x Q has a neighborhood, on which s is
bounded. Let w.l.o.g. 0 be the center of Q
1
. Since Qis open, there exists a 0 < < 1,
such that (1 + )x Q and U :=

1+
Q
1
+ x Q. Let x
t
U, i.e. x
t
= x +

1+
z
with z Q
1
. Then we obtain
A
i
(x
t
) = A
i
_
1 +
1 +
x +

1 +
z
_
C
i
1
1 +
A
i
((1 +)x) +

1 +
A
i
(z) =:
i
(z).
On the other hand
A
i
(x
t
) = (1 +)
_
1
1 +
A
i
(x
t
) +

1 +
A
i
_
z
1 +
__
A
i
_
z
1 +
_
C
i
(1 +)
_
A
i
_
x
t
1 +

(1 +)
2
z
__
A
i
_
z
1 +
_
= (1 +)A
i
_
x
1 +
_
A
i
_
z
1 +
_
=:
i
(z).
Since A
i
is on Q pointwise norm-bounded and on Q
1
uniformly norm-bounded,
there exists a number r > 0, such that for all z Q
1
and all i I we have
i
(z),
i
(z) K(0, r).
The family C
i
iI
is uniformly normal. Therefore there is a ball K(0, R) with
[K(0, r)]
C
i
K(0, R) and hence also
A
i
(x
t
) [
i
(z),
i
(z)]
C
i
K(0, R),
i.e. |A
i
(x
t
)| R for all i I and all x
t
U.
In the nal step we turn our attention to the uniform Lipschitz continuity.
Let B be the unit ball in X and let > 0 be chosen such that s is bounded on
x +B +B Q, i.e. there is a l > 0 with
s(x +B +B) [0, l]. (5.5)
For y
1
, y
2
x +B and y
1
,= y
2
we have
z := y
1
+
(y
1
y
2
)
|y
1
y
2
|
x +B +B.
Let :=
|y
1
y
2
|
+|y
1
y
2
|
, then, because of y
1
= (1 )y
2
+z
A
i
(y
1
)
C
i
(1 )A
i
(y
2
) +A
i
(z) = A
i
(y
2
) +(A
i
(z) A
i
(y
2
)),
i.e.
A
i
(y
1
) A
i
(y
2
)
C
i
(A
i
(z) A
i
(y
2
)).
Correspondingly for v := y
2
+
(y
1
y
2
)
|y
1
y
2
|
x +B +B we obtain
A
i
(y
2
) A
i
(y
1
)
C
i
(A
i
(v) A
i
(y
1
)),
i.e.
A
i
(y
1
) A
i
(y
2
) [A
i
(y
1
) A
i
(v), A
i
(z) A
i
(y
2
)]
C
i
.
By (5.5) we have for all y
1
, y
2
x +B and all i I both
A
i
(y
1
) A
i
(v) and A
i
(z) A
i
(y
2
) K(0, 2l).
Since C
i
is uniformly normal, there exists a ball K(0, l
1
), such that [K(0, 2l)]
C
i

K(0, l
1
) for all i I. Thus
A
i
(y
2
) A
i
(y
1
) K(0, l
1
),
i.e.
|A
i
(y
2
) A
i
(y
1
)| l
1

|y
1
y
2
|
l
1
= L|y
1
y
2
|
with L =
l
1
.
Corollary 5.4.8. Let A
i
iI
as in the above theorem, then A
i
iI
is equicontinu-
ous.
Component-wise Convex Mappings
The theorem proved in 5.4.7 can be extended to component-wise convex mappings.
Theorem 5.4.9. Let for each j 1, . . . , n X
j
be a Banach space and U
j
an open
and convex subset of X
j
. Let Y be a normed space, containing the uniformly normal
family C
ij
[ i I, j = 1, . . . , n of convex cones.
Furthermore let a family of mappings F = A
i
: U
1
U
n
Y
iI
be given,
which is pointwise bounded and has the property that for all i I and j 1, . . . , n
the component A
ij
: U
j
Y is continuous and C
ij
-convex.
Then every point in U
1
U
n
has a neighborhood U, on which F is uniformly
bounded and uniformly Lipschitz continuous, i.e. there is a L > 0 such that for all
u, v U and all i I
|A
i
(u) A
i
(v)| L|u v|
holds.
Proof. The proof is performed by complete induction over n. Theorem 5.4.7 yields
the start of the induction for n = 1.
We assume, the assertion holds for n 1.
For the uniform boundedness it apparently sufces to show the following property:
for all sequences x
k
= (x
k,1
, . . . , x
k,n
), which converge to a ( x
1
, . . . , x
n
), and all
sequences (A
k
)
kN
in F the sequence (|A
k
(x
k
)|)
kN
is bounded.
Let the norm in X
1
X
n
be given by | |
X
1
+ + | |
X
n
. For all
z

U = U
1
U
n1
the sequence A
k
(z, ) [ k N is pointwise bounded and
by Theorem 5.4.7 equicontinuous at x
n
. Since x
k,n

k
x
n
we have for all z

U
A
k
(z, x
k,n
)
kN
is bounded, (5.6)
i.e. the family A
k
(, x
k,n
)
kN
is pointwise bounded and by induction hypothesis
equicontinuous at ( x
1
, . . . , x
n1
).
Since (x
k,1
, . . . , x
k,n1
) ( x
1
, . . . , x
n1
) we have
A
k
(x
k,1
, . . . , x
k,n
) A
k
( x
1
, . . . , x
n1
, x
k,n
) 0.
Using Statement (5.6) the uniform boundedness follows.
Therefore there exist open neighborhoods V
j
in U
j
, j 1, . . . , n and R,
such that for all v
j
V
j
and all A F
|A(v
1
, . . . , v
n
)| .
Let Q := V
1
V
n1
. We consider the family A(, v) : Q Y [ A F, v
V
n
. It is pointwise bounded. Let x
0
Q and x
n
V
n
. By the induction hypothesis
Section 5.5 Quantitative Stability Considerations in R
n
153
there exists a neighborhood W Q of x
0
and a L
1
> 0, such that for all u
1
, u
2
W
and all v V
n
|A(u
1
, v) A(u
2
, v)| L
1
|u
1
u
2
|
holds. In analogy to the above the family A(w, ) : V
n
Y [ A F, w V
1

V
n1
is pointwise bounded, and by Theorem 5.4.7 there exists a neighborhood

V
n
of
x
n
with

V V
n
and L
2
> 0, such that for all v
1
, v
2

V and u Q
|A(u, v
1
) A(u, v
2
)| L
2
|v
1
v
2
|.
For all (x, y), (u, v) W

V and all A F we then obtain with L := maxL
1
, L
2
:
|A(u, v) A(x, y)| |A(u, v) A(x, v)| +|A(x, v) A(x, y)|
L(|u x| +|v y|) = L|(u, v) (x, y)|.
5.5 Quantitative Stability Considerations in R
n
In order to better appreciate the subsequent theorem, we will draw a connection to the
numerical realization of the Polya algorithm: when solving the approximating prob-
lems, one can use the norm of the derivative of the function to be minimized as a
stop criterion. It will turn out that stability is essentially preserved if the approximat-
ing problems are only solved approximatively by a x
k
, if the sequence (|f
k
( x
k
)|)
tends to 0.
k
: R
n
R)
kN
be a sequence of differentiable convex func-
tions, which converge pointwise to a function f : R
n
R. Let further the set of
minimal solutions of f on R
n
be non-empty and bounded and let ( x
k
)
kN
be a se-
quence in R
n
with the property
lim
k
|f
k
( x
k
)| = 0.
Then
(a) The set of points of accumulation of the sequence ( x
k
) is non-empty and con-
tained in M(f, R
n
).
(b) f
k
( x
k
) inf f(R
n
).
(c) f( x
k
) inf f(R
n
).
(d) Let Q be an open bounded superset of M(f, R
n
) and
k
:= sup
xQ
[f(x) f
k
(x)[,
then for k sufciently large x
k
Q and
[inf f(R
n
) f( x
k
)[ = O(
k
) +O(|f
k
( x
k
)|).
Proof. Let > 0 be given, C := x R
n
[ d(x, M(f, R
n
)) and S := x
R
n
[ d(x, M(f, R
n
)) = . Furthermore let x
0
be a minimal solution of f. Suppose
there is a subsequence ( x
i
) of ( x
k
), which is contained in C. Then there are numbers
i
(0, 1] with
i
x
i
+ (1
i
)x
0
S. Since the derivative of a convex function is a
monotone mapping, we have
f
i
( x
i
) f
i
(
i
x
i
+ (1
i
)x
0
), (1
i
)( x
i
x
0
) 0.
Therefore
_
f
i
( x
i
),
x
i
x
0
| x
i
x
0
|
_
_
f
i
(
i
x
i
+ (1
i
)x
0
),

i
( x
i
x
0
)
|
i
( x
i
x
0
)|
_
f
i
(
i
x
i
+ (1
i
)x
0
) f
i
(x
0
)
|
i
( x
i
x
0
)|
,
where the last inequality follows from the subgradient inequality. Let now be d :=
1
3
(inf f(S) inf f(R
n
)). As the sequence (f
i
) converges uniformly on S to f (see
Theorem 5.3.6) we have for large i
f
i
(
i
x
i
+ (1
i
)x
0
) inf f(S) d,
and
f
i
(x
0
) inf f(R
n
) +d.
Let now be the diameter of M(f, R
n
), then apparently
|
i
( x
i
x
0
)| +.
Hence we obtain
_
f
i
( x
i
),
x
i
x
0
| x
i
x
0
|
_
d
+
> 0,
i.e. |f
i
( x
i
)|
d
+
> 0 in contradiction to our assumption.
We now prove the remaining parts of the assertion. By Theorem 5.3.25 there is a
K N, such that M(f
i
, R
n
) ,= for i K. Let now x
i
M(f
i
, R
n
) for i K,
then
f
i
( x
i
) f
i
(x
i
) f
i
( x
i
), x
i
x
i
|f
i
( x
i
)|| x
i
x
i
|.
According to what has been proved above there is N N such that x
i
, x
i
Q for
i N. Then using the subsequent theorem
f( x
i
) inf f(R
n
) f
i
( x
i
) +O(
i
) inf f(R
n
)
(f
i
( x
i
) f
i
(x
i
)) + (f
i
(x
i
) f(x
i
)) +O(
i
)
+ (f(x
i
) inf f(R
n
))
|f
i
( x
i
)|| x
i
x
i
| +O(
i
).
Due to the boundedness of the sequences ( x
i
) and (x
i
) the assertion follows.
Section 5.5 Quantitative Stability Considerations in R
n
155
Remark. If the limit function f has a unique minimal solution, then we obtain for
the estimates in (d)
[inf f(R
n
) f( x
k
)[ = O(
k
) +o(|f
k
( x
k
)|).
In the previous theorem we have derived quantitative assertions about the speed
of convergence of approximative values. For his purpose we have made use of the
following theorem, which extends investigations of Peetre [90] on norms to sequences
of convex functions.
k
: R
n
R)
kN
be a sequence of convex functions with
bounded level sets, which converges pointwise to the function f : R
n
R. Let
further K be a closed convex subset of R
n
and x
k
M(f
k
, K).
(a) Let Q be an open bounded superset of M(f, K) and
k
:= sup
xQ
[f(x) f
k
(x)[,
then
i. f(x
k
) inf f(K) = O(
k
)
ii. [f
k
(x
k
) inf f(K)[ = O(
k
).
(b) If there is a sequence (
k
)
kN
tending to zero, such that for all x R
n
and
k N the inequality [f(x) f
k
(x)[
k
[f(x)[ holds, then
i. f(x
k
) inf f(K) 2
k
[f(x
k
)[
2
k
1
k
[f
k
(x
k
)[
ii. [f
k
(x
k
) inf f(K)[ 3
k
[f(x
k
)[
3
k
1
k
[f
k
(x
k
)[.
Proof. By Theorem 5.3.25 there is a N N, such that x
k
Q for k N. Due to
the uniform convergence of f
k
to f on Q the sequence (
k
) tends to zero. Thus for
k N and x
0
M(f, K)
f(x
k
) f(x
0
) (f(x
k
) f
k
(x
k
)) + (f
k
(x
0
) f(x
0
)) 2
k
,
and
[f
k
(x
k
) f(x
0
)[ [f
k
(x
k
) f(x
k
)[ +[f(x
k
) f(x
0
)[ 3
k
.
Ad (b): In a similar way as above one obtains for k N
f(x
k
) f(x
0
)
k
(f(x
k
) +f(x
0
)) 2
k
f(x
k
),
and
[f
k
(x
k
) f(x
0
)[ 3
k
f(x
k
).
Due to f
k
(x
k
) (1
k
)f(x
k
) the remaining part of the assertion follows.
The following theorem gives a framework for obtaining estimates for the sequence
of distances (d(x
k
, M(f, R
n
))), from the convergence of the sequence of function
values (f(x
k
)).
Theorem 5.5.3. Let (f : R
n
R) be a convex function with bounded level sets.
Then there is a strictly monotonically increasing function : R
+
R with (0) = 0,
such that for all x R
n
f(x) inf f(R
n
) +(d(x, M(f, R
n
)))
holds.
Proof. Let S
r
:= x R
n
[ d(x, M(f, R
n
)) = r for r > 0 and
K
r
:= x R
n
[ d(x, M(f, R
n
)) r,
K
r
is apparently closed. Let c
r
:= inf f(K
r
) and let (y
n
) be a sequence in K
r
with
f(y
n
) c
r
. Let > 0 then for n large enough we have y
n
S
f
(c
r
+ ). Since
the level sets of f are bounded, there is a convergent subsequence y
n
k
x
r
K
r
yielding f(x
r
) = c
r
. Hence f attains its minimum x
r
on K
r
. Apparently x
r
S
r
,
because otherwise x
r
would be an interior point of K
r
and hence a local, thus a global
minimum of f on R
n
. Let c := inf f(R
n
), then we dene
(r) := f(x
r
) c = inf f(K
r
) c > 0.
Apparently (r
1
) (r
2
) for r
1
> r
2
. Let now x
r
1
M(f, R
n
) such that |x
r
1

x
r
1
| = r
1
. Then there is a
0
(0, 1) with x
0
:=
0
x
r
1
+ (1
0
)x
r
1
S
r
2
.
Let h() := f( x
r
1
+ (1 )x
r
1
) for [0, 1]. Apparently h is convex on
[0, 1] and we have: h(
0
) (1
0
)h(0) +
0
h(1). Since h(1) = c and h(
0
) =
f(x
0
) (r
2
) + c, as well as h(0) = f(x
r
1
) = (r
1
) + c it follows that h(
0
)
(1
0
)(r
1
) +c. Putting everything together we obtain (r
2
) < (r
1
).
Corollary. Let f : R
n
Ra convex function with bounded level sets and let (x
k
)
kN
be a minimizing sequence f(x
k
) inf f(R
n
). Let : R
+
R be the strictly
monotonically increasing function of the previous theorem, then
d(x
k
, M(f, R
n
))
1
([f(x
k
) inf f(R
n
)[).
5.6 Two-stage Optimization
For a suitable choice of a parametric family f
k
of functions, convergence to a par-
ticular solution of the original problem can occur even in the case of non-uniqueness
of the original problem. This limit can often be characterized as the solution of a
Section 5.6 Two-stage Optimization 157
two-stage optimization problem. The corresponding framework will be developed in
the sequel.
In the treatment of optimization problems the original problem is frequently re-
placed by a sequence of approximating problems. If the approximating sequence is
determined, only certain solutions of the original problem will be available as limits.
Frequently they will turn out to be solutions of a second stage, where the second stage
is described by a function, which implicitly depends on the choice of the sequence of
the approximating problems.
Denition 5.6.1. Let C be a set and g
1
, g
2
be two functions on C with values in
R. The following problem is called the two-stage minimization problem (g
1
, g
2
, C).
Among the minimal solutions of g
1
on C those are selected, which are minimal
w.r.t. g
2
.
The solutions, i.e. the minimal solutions of g
2
on M(g
1
, C), we call two-stage
solutions of the problem (g
1
, g
2
, C). The set of solutions is denoted by M(g
1
, g
2
, C).
In the following theorem we assume the unique solvability of the approximating
problems. Furthermore we assume that every point of accumulation of the sequence
of these minimal solutions is an element of the set of solutions of the original problem
(closedness of the algorithm).
In this context we are interested in a characterization of these points of accumu-
lation. Under favorable circumstances this characterization will enable us to enforce
convergence of the sequence and to describe the limit as a solution of a two-stage
optimization problem.
Theorem 5.6.2. Let X be a metric space, f : X R, and let for the sequence
of functions (f
n
: X R)
nN
hold: for every n N the minimization problem
(f
n
, X) has a unique solution x
n
X and let every point of accumulation of the
sequence (x
n
)
nN
be in M(f, X). Let now (a
n
)
nN
be a sequence of non-negative
numbers such that the sequence of the functions (a
n
(f
n
f))
nN
converges lower
semi-continuously to a function g : X R. Then every point of accumulation of the
sequence (x
n
)
nN
is in M(f, g, X).
Proof. Let y M(f, X) and x = limx
n
i
with x
n
i
M(f
n
i
, X). We have
(f
n
i
(x
n
i
) f
n
i
(y)) + (f(y) f(x
n
i
)) 0,
hence
a
n
i
(f
n
i
(x
n
i
) f(x
n
i
)) a
n
i
(f
n
i
(y) f(y)).
The lower semi-continuous convergence implies: g( x) g(y).
Remark 5.6.3. The sequence (a
n
(f
n
f))
nN
converges lower semi-continuously
to g, if it converges pointwise and the following condition is satised: there exists a
sequence (
n
)
nN
tending to zero, such that a
n
(f
n
f) +
n
g and g is lower
semi-continuous.
This means uniform convergence from below on the whole space.
We will now discuss a special case of this approach: the regularization method of
Tikhonov:
Here f
n
= f +
n
g where (
n
) is a sequence of positive numbers tending to zero
and g an explicitly given lower semi-continuous function. Then for a
n
=
1
n
for all
n N we have a
n
(f
n
f) = g.
As an example we consider the function sequence for the approximate treatment
of best approximation in the mean, which was already mentioned in the introduction
to Chapter 1. Let the set M be a convex and closed subset of R
m
. Let x R
m
and z M, then put f
n
(z) =
m
i=1
n
(x
i
z
i
) where
n
(s) = [s[ +
1
n
s
2
and
f(z) =
m
i=1
[x
i
z
i
[. In order to specify the second stage, choose a
n
= n and
obtain
g(z) =
m
i=1
(x
i
z
i
)
2
.
These specications have a regularizing effect in the following sense: every approxi-
mating problem is uniquely solvable and the sequence of the corresponding solutions
converges to the particular best approximation in the mean, which has minimal Eu-
clidean distance to x. Here we have also applied the stability theorem in R
m
(see
Theorem 5.3.25).
In the section on applications in Chapter 8 we will revisit the regularization method
of Tikhonov under the aspect of local uniform and level uniform convexity in connec-
tion to strong solvability in reexive spaces.
For the Tikhonov regularization the function g is explicitly chosen and in this way
the second stage of the optimization is determined. Frequently a second stage is im-
plicitly contained in the choice of the approximating function sequence (f
n
).
We will prove the above theorem in a slightly generalized form, in order to simplify
the determination of the second stage.
Theorem 5.6.4. Let X be a metric space, f : X R, and for the sequence of
functions (f
n
: X R)
nN
we assume: for every n N the problem (f
n
, X) has a
unique solution x
n
X and every point of accumulation of the sequence (x
n
)
nN
is
in M(f, X). Let now (
n
)
nN
be a sequence of non-decreasing functions, such that
the sequence of functions (
n
(f
n
f))
nN
converges lower semi-continuously to a
function g : X R. Then every point of accumulation of the sequence (x
n
)
nN
is in
M(f, g, X).
Proof. Let y M(f, X) and x = limx
n
i
with x
n
i
minimal solution of f
n
i
. Then
f
n
i
(x
n
i
) f
n
i
(y) +f(y) f(x
n
i
) 0
f
n
i
(x
n
i
) f(x
n
i
) f
n
i
(y) f(y)
n
i
(f
n
i
(x
n
i
) f(x
n
i
))
n
i
(f
n
i
(y) f(y)).
The lower semi-continuous convergence implies g( x) g(y).
Corollary. If one replaces in the above theorem non-decreasing by non-increasing
and lower semi-continuous by upper semi-continuous, then one obtains: every point
of accumulation of the sequence is a maximal solution of g on M(f, X).
The condition of lower semi-continuous convergence is in many cases naturally
satised, because according to Theorem 5.2.1 a monotone sequence of lower semi-
continuous functions, which converges pointwise to a lower semi-continuous func-
tion, is already lower semi-continuously convergent.
Example 5.6.5. As an example we consider the treatment of best approximation in
the mean in R
m
, where the approximating problems are given by the modulars for the
Young functions
n
(s) = [s[
1
n
log(1 +n[s[). If now
f
n
(x) =
m
i=1
_
[x
i
[
1
n
log(1 +n[x
i
[)
_
,
and f(x) =
m
i=1
[x
i
[, then one can deduce g(x) :=
m
i=1
([x
i
[) as a second stage,
because
f
n
(x) f(x) =
m
i=1
_
[x
i
[
1
n
log(1 +n[x
i
[)
_
i=1
[x
i
[
=
m
i=1
1
n
log(1 +n[x
i
[) =
1
n
log
m
i=1
(1 +n[x
i
[)
=
1
n
log
_
n
m
m
i=1
_
1
n
+[x
i
[
__
.
Let now
n
be the monotonically decreasing function s
n
(s) =
1
n
m
exp(ns).
Then
n
(f
n
(x) f(x)) =
m
i=1
_
1
n
+[x
i
[
_
,
and we obtain
n
(f
n
(x) f(x))
n
i=1
([x
i
[),
and the convergence is lower semi-continuous, since the sequence is monotonically
dencreasing and pointwise convergent.
Second-stage by Differentiation w.r.t. a Parameter
When determining a second stage of an optimization problem one can frequently use
differential calculus. We assume a real parameter > 0 and put F(x, ) := f
(x)
as well as F(x, 0) = f(x). For many functions F the partial derivative w.r.t. is
a candidate for the second stage. Only the right-sided derivative is needed in this
context, which exists for a broad class functions of a real variable.
g(x) :=

+
F(x, 0) = lim
0
F(x, ) F(x, 0)
.
If for xed x the mapping F(x, ) is convex, then due to the monotonicity
of the difference quotient lower semi-continuous convergence is already guaranteed
by pointwise convergence w.r.t. x.
Theorem 5.6.6. Let X be a metric space and let the function F : X [0, a] R
satisfy the following conditions:
(a) F(x, ) : [0, a] R is for all x X twice continuously differentiable
(b)

F(, 0) is lower semi-continuous. There exists a R, such that

2
2
F(, )
for all x X and all [0, a].
Then the rst stage is given by the function f := F(, 0) and the second stage by
g :=

F(, 0). Let (

n
)
nN
be a sequence tending to zero, f
n
:= F(,
n
) and
x
n
M(f
n
, X), then every point of accumulation of the sequence (x
n
)
nN
is in
M(f, g, X).
Proof. By the expansion theorem of Taylor there is a (0,
n
), such that
F(x,
n
) = F(x, 0) +
n
F(x, 0) +

2
n
2
2
F(x, ).
Therefore
F(x,
n
) F(x, 0)
F(x, 0) +

n
2
.
Due to Remark 5.6.3 the convergence is lower semi-continuous and by Theorem 5.6.2
the assertion follows.
Testing of the boundedness of the second partial derivative can be omitted, if F is
a convex function of the parameter for xed x. We then have
Theorem 5.6.7. Let X be a metric space and let the function F : X [0, a] R
satisfy the following conditions:
(a) F(x, ) : [0, a] R is convex for all x X.
(b) The right-sided partial derivative w.r.t. the parameter

+
F(, 0) : X R is
lower semi-continuous.
Then the second stage is given by the function g :=

+
F(, 0). Let (
n
)
nN
be
a decreasing sequence of positive numbers tending to zero, f
n
:= F(,
n
) and
x
n
M(f
n
, X), then every point of accumulation of the sequence (x
n
)
nN
is in
M(f, g, X), where f := F(, 0).
Proof. The monotonicity of the difference quotient of convex functions yields:
F(x,
n
) F(x, 0)

+
F(x, 0)
monotonically decreasing. Therefore the convergence is lower semi-continuous.
Remark. If in the above theorem the function g is strictly convex, then one has even
convergence of the sequence (x
n
) to the corresponding two-stage solution.
As a rst application we describe the limits one obtains if the problem of best
approximation in the mean (L
1
-approximation) is replaced by a sequence of modular
approximation problems. A particularly interesting special case is that of best L
p
-
approximations (p > 1) for p 1. In order to compute the second stage we put
F(x, ) :=
_
T
[x(t)[
1+
d.
For each xed t T the function
e
(1+) log [x(t)[
is convex. Due to the monotonicity of the integral the corresponding pointwise con-
vexity inequality remains valid. Hence also
F(x, )
is convex for every xed x. Due to the monotonicity of the difference-quotient one
can interchange differentiation w.r.t. the parameter and integration (see theorem on
monotone convergence of integration theory) and one obtains
g(x) :=

F(x, 0) =
_
T
[x(t)[ log [x(t)[d. (5.7)
The function g is called entropy function.
Let now (T, , ) be a nite measure space, V a nite-dimensional subspace of
L
p
0
() for p
0
> 1 (resp. a closed convex subset thereof) and [0, p
0
1]. Then
for these all the functions F(, ) are nite on V . Also the function g is nite on V ,
because of the growth properties of the logarithm (and because of the boundedness
of u ulog u from below for u 0). Apparently g is strictly convex and hence
in particular continuous on V . Using the stability theorem we obtain the following
assertion.
Theorem 5.6.8. Let (T, , ) be a nite measure space, V a nite-dimensional sub-
space of L
p
0
() for p
0
> 1 (resp. a closed convex subset thereof) and (p
n
)
nN
a
sequence in [1, p
0
] with p
n
1. Then the sequence of best L
p
n
-approximations of
an element x L
p
0
() w.r.t. V converges to the best L
1
()-approximation of largest
entropy.
The following theorem provides a topological version of Theorem 5.6.6, which will
come into play in the context of level uniform convexity and the weak topology.
Theorem 5.6.9. Let C be a compact topological space and let the function F : C
[0, a] R satisfy the following conditions:
(a) F(x, ) : [0, a] R is twice continuously differentiable for all x C.
(b)

F(, 0) and F(, 0) are lower semi-continuous.

(c) There exists a R, such that

2
2
F(, ) for all x C and all [0, a].
Then the rst stage is given by the function f := F(, 0) and the second stage by g :=
F(, 0). Let (

n
)
nN
be a positive sequence tending to zero, dene f
n
:= F(,
n
),
then
(a) M(f
n
, C) and S := M(f, g, C)) are non-empty.
(b) Let x
n
M(f
n
, C), then the set of points of accumulation of this sequence is
non-empty and contained in M(f, g, X).
(c) Let x
0
M(f, g, X) then
i. f(x
n
) f(x
0
)
ii. g(x
n
) g(x
0
)
iii.
f(x
n
)f(x
0
)
n
0.
Proof. Being lower semi-continuous on a compact set, g is bounded from below by
a number
1
. The function

2
2
F(x
0
, ) is bounded from above by a number
2
. Let
x C and [0, a]. By the expansion theorem of Taylor there are a
t
,
tt
(0, ),
such that
F(x, ) F(x
0
, ) = F(x, 0) F(x
0
, 0) +
_

F(x, 0)

F(x
0
, 0)
_
+

2
2
_

2
2
F(x,
t
)

2
2
F(x
0
,
tt
)
_
f(x) f(x
0
) +(
1
g(x
0
)) +

2
2
(
2
)
for
1
and
2
independent of x and . Let now x = x
n
and =
n
then
0 f
n
(x
n
) f
n
(x
0
) f(x
n
) f(x
0
)
. .
0
+
n
(
1
g(x
0
)) +

2
n
2
(
2
),
and hence f(x
n
) f(x
0
).
Let further (x
n
i
) be a convergent subsequence such that x
n
i
x. Since
limf(x
n
i
) = f(x
0
) = inf f(C), the lower semi-continuity of f yields f(x)
f(x
n
i
) + for large n
i
, hence
f(x) f(x
0
),
and therefore x M(f, C).
Suppose ii. does not hold, then there is r > 0 and a convergent subsequence (x
n
j
)
of (x
n
) with x
n
j
x
1
and [g(x
n
j
) g(x
0
)[ > r.
This will lead to a contradiction, since for large n
j
and g(x
n
j
) g(x
0
) > r
0 f
n
j
(x
n
j
) f
n
j
(x
0
) f(x
n
j
) f(x
0
) +
n
j
_
r +
n
j
2
(
2
)
_
> 0.
Suppose now g(x
0
) g(x
n
j
) > r then due to lower semi-continuity of g we obtain
g(x
0
) g(x
1
) + r contradicting x
1
M(f, C) and x
0
M(f, g, C). Using the
lower semi-continuity of g we obtain (b) from ii.
g(x
0
) = limg(x
n
j
) g(x
1
).
Since x
1
M(f, C) it follows that x
1
M(f, g, C). It remains to be shown that iii.
holds
0 f(x
n
) f(x
0
) +
n
(g(x
n
) g(x
0
)) +

2
n
2
(
2
).
Division by
n
together with ii. yields iii.
5.6.1 Second Stages and Stability for Epsilon-solutions
k
: R
n
R)
kN
be a sequence of differentiable convex
functions, which converges pointwise to a function f : R
n
R. Let further the set
of minimal solutions of f on R
n
be non-empty and bounded. Let now (
k
)
kN
be a
sequence of non-negative numbers tending to zero such that the sequence of functions
(
f
k
f
k
)
kN
converges lower semi-continuously to a function g : Q R on an open
bounded superset Q of M(f, R
n
).
Let further (
k
)
kN
be a sequence of non-negative numbers tending to zero with
lim
k
k
= 0 and ( x
k
)
kN
a sequence in R
n
with the property |f
k
( x
k
)|
k
.
Then the set of points of accumulation of the sequence ( x
k
) is non-empty and con-
tained in M(f, g, R
n
).
Proof. Let x
k
M(f
k
, R
n
) for k N. By Theorem 5.5.1 for k > K
0
the sequences
x
k
and x
k
are contained in Q. If y M(f, R
n
), then one has
f
k
(x
k
) f
k
(y) +f(y) f( x
k
) 0.
Adding to both sides of the inequality the expression f
k
( x
k
) f
k
(x
k
), one obtains
f
k
( x
k
) f
k
(y) +f(y) f( x
k
) f
k
( x
k
) f
k
(x
k
) |f
k
( x
k
)|| x
k
x
k
|

k
| x
k
x
k
|.
Division of both sides by
k
yields
f
k
( x
k
) f( x
k
)
f
k
(y) f(y)
k
+

k
k
| x
k
x
k
|.
The sequences (x
k
) and ( x
k
) are bounded and hence also the last factor on the right-
hand side. Let x be a point of accumulation of ( x
k
), then we obtain by the lower
semi-continuous convergence
g( x) g(y).
Remark. If in the above theorem the function g is strictly convex, then the sequence
( x
k
) converges to the corresponding two-stage solution.
Example 5.6.11. We consider once more the best L
p
-approximations (p > 1) for
p 1. Let (
k
) be a sequence of positive numbers tending to zero. If we put
f
k
(x) :=
_
T
[x(t)[
1+
k
d,
and
f(x) :=
_
T
[x(t)[d,
then the sequence (
f
k
f
k
) converges, as seen above, lower semi-continuously to the
strictly convex function
g(x) =
_
T
[x(t)[ log [x(t)[d.
If one determines the best L
p
k
-solutions with p
k
= 1 +
k
only in an approximate
sense, but with increasing accuracy, such that the stop criteria (for the norms of the
gradients) converges more rapidly to zero than the sequence (
k
), then the sequence of
the thus determined approximate solutions converges to the best L
1
()-approximation
of largest entropy.
As a consequence of the above Theorem 5.6.10 we obtain
k
: R
n
R)
kN
be a sequence of differentiable convex
functions, which converge pointwise to a function f : R
n
R. Let further the set of
minimal solutions of f on R
n
be non-empty and bounded. Let now Q be an open and
bounded superset of M(f, R
n
) and
k
:= sup
xQ
[f(x) f
k
(x)[.
Let further (
k
)
kN
be a sequence of non-negative numbers tending to zero, such that
lim
k
k
= 0 holds.
Let now the function g : R
n
R be differentiable and convex, and let the sequence
of functions (h
k
: R
n
R)
kN
be dened by
h
k
:=
k
g +f
k
.
Let further (
k
)
kN
be a sequence of positive numbers tending to zero, such that
lim
k
k
= 0 holds, and let ( x
k
)
kN
be a sequence in R
n
with the property
|h
k
( x
k
)|
k
.
Then every point of accumulation of the sequence ( x
k
) is a solution of the two-stage
optimization problem (f, g, R
n
).
If in addition g is strictly convex, then the sequence ( x
k
) converges to the uniquely
determined second-stage solution.
Proof. Apparently the sequence (h
k
) converges pointwise to the limit function f. Due
to the previous theorem it sufces to show the lower semi-continuous convergence of
the sequence (h
k
f)/
k
to g on Q. But this follows using Remark 5.6.3 from
h
k
f
k
= g +
f
k
f
k
g

k
k
.
Together with the convergence estimates in Chapter 2 we are now in the position
to present a robust regularization of the Polya algorithm for the Chebyshev approx-
imation in the spirit of Tikhonov, and hence obtain convergence to the second-stage
solution w.r.t. an arbitrarily chosen strictly convex function g.
Theorem 5.6.13 (Regularization of Polya Algorithms). Let T be a compact metric
space and V a nite dimensional subspace of C(T) (resp. a closed convex subset
thereof) and let x C(T).
Let (
k
)
kN
be a sequence of differentiable Young functions converging pointwise
to
.
Let further (
k
)
kN
be a sequence tending to zero such that
[|y|
(
k
)
|y|
[
k
|y|
for all y x +V .
We now choose a sequence (
k
)
kN
of non-negative numbers tending to zero such
that lim
k
k
= 0 holds, and a strictly convex function g : x + V R. As an
outer regularization of the Luxemburg norms we then dene the function sequence
(h
k
: x +V R)
kN
with h
k
:=
k
g +| |
(
k
)
.
If the minimal solutions x
k
of h
k
are determined approximately but with growing
accuracy such that the norms of the gradients converge to zero more rapidly than the
sequence (
k
), then the sequence ( x
k
) converges to the best Chebyshev approxima-
tion, which is minimal w.r.t. g.
5.7 Stability for Families of Non-linear Equations
For sequences of convex functions, equicontinuity and hence continuous convergence
already follows frompointwise convergence (see Theorem5.3.13 and Theorem5.3.8).
Subsequently we will show that a similar statement holds for sequences of monotone
operators. Continuous convergence in turn implies stability of solutions under certain
conditions on the limit problem.
Both, solutions of equations and optimization problems, can be treated in the frame-
work of variational inequalities. In fact, sequences of variational inequalities show a
similar stability behavior, where again continuous convergence is of central signi-
cance (see [59], Satz 1, p. 245). We employ this scheme in the context of two-stage
solutions.
We have treated stability questions for minimal solutions of pointwise convergent
sequences of convex functions in Section 5.3. It turns out that stability can be guar-
anteed if the set of minimal solutions of the limit problem is bounded (see Theo-
rem 5.3.25, see also [59]). The question arises, whether a corresponding statement
holds on the equation level for certain classes of mappings that are not necessarily po-
tential operators. Questions of this type arise e.g. in the context of smooth projection
methods for semi-innite optimization (see [62]).
Section 5.7 Stability for Families of Non-linear Equations 167
A general framework for treating stability questions involving non-linear equa-
tions for sequences of continuous operators on R
n
is given by the following scheme
(see [62]):
Theorem 5.7.1. Let U R
n
and (A
k
: U R
n
)
kN
be a sequence of continuous
operators that converges continuously on U to a continuous operator A : U R
n
with the property: there exists a ball K(x
0
, r) U, r > 0 such that
Ax, x x
0
> 0 (5.8)
for all x in the sphere S(x
0
, r).
Then there exists k
0
N such that for all k k
0
each equation A
k
x = 0 has a
solution x
k
in K(x
0
, r).
Furthermore every point of accumulation of (x
k
)
kN
is a solution of Ax = 0.
The above theorem is a consequence of the following well-known
Lemma 5.7.2. Let A : U R
n
be continuous. If there is r > 0 and a ball
K(x
0
, r) U such that Ax, x x
0
0 for all x S(x
0
, r), then the non-linear
equation Ax = 0 has a solution in K(x
0
, r).
Proof. Otherwise Brouwers xed point theorem applied to the mapping
x g(x) := r
_
Ax
|Ax|
_
+x
0
would lead to a contradiction, because then |Ax| ,= 0 on K(x
0
, r), hence g continu-
ous there and thus has a xed point x ,= x
0
on K(x
0
, r). Hence
_
Ax, r
x x
0
|x x
0
|
_
= |x x
0
||Ax| < 0.
5.7.1 Stability for Monotone Operators
A large class of operators can be treated using the above stability principle, where
pointwise convergence already implies uniform convergence on compact subsets.
Among them are the monotone operators according to
Denition 5.7.3. Let X be a normed space and let U be a subset of X. A mapping
A : U X
is called monotone on U if for all x, y U the following inequality

holds:
Ax Ay, x y 0.
Lemma 5.7.4 (see [110]). Let U be an open subset of R
n
and let (A
k
: U R
n
)
kN
be a sequence of continuous monotone operators that converges pointwise on U to an
operator A : U R
n
. Then for every sequence (x
k
) U that converges in U it
follows that the sequence (A
k
x
k
)
kN
is bounded.
Proof. Assume that there is a sequence (x
k
) in U with limx
k
= x
0
U such
that (A
k
x
k
) is unbounded. Then there is a subsequence (A
k
i
x
k
i
) with the property
|A
k
i
x
k
i
| i for all i N. As A
k
i
is monotone we obtain for all z U
A
k
i
x
k
i
A
k
i
z, x
k
i
z 0.
Let now y
k
i
:=
A
k
i
x
k
i
|A
k
i
x
k
i
|
then we can w.l.o.g. assume that the sequence (y
k
i
) con-
verges to some y in the unit sphere. If the above inequality is divided by |A
k
i
x
k
i
| we
obtain for all z U
_
y
k
i

A
k
i
z
|A
k
i
x
k
i
|
, x
k
i
z
_
0.
pointwise convergence implies A
k
i
z Az and hence
lim
i
_
y
k
i

A
k
i
z
|A
k
i
x
k
i
|
, x
k
i
z
_
= y, x
0
z 0 z U.
As U is open it follows that y = 0, a contradiction.
The following theorem states that pointwise convergence of continuous monotone
operators already implies continuous convergence:
Theorem 5.7.5 (see [110]). Let U be an open subset of R
n
and let (A
k
: U
R
n
)
kN
be a sequence of continuous monotone operators that converges pointwise on
U to a continuous operator A : U R
n
then (A
k
) is equicontinuous on U.
Proof. According to Theorem 5.3.6 it is sufcient to show the following: convergence
of a sequence (x
k
) in U to an element x
0
U implies limA
k
x
k
= Ax
0
.
Assume that there is a sequence (x
k
) in U convergent to x
0
U such that (A
k
x
k
)
does not converge to Ax
0
, i.e. there is > 0 and a subsequence (A
k
i
x
k
i
) with the
property
|A
k
i
x
k
i
Ax
0
|
for all i N. By Lemma 5.7.4 (A
k
i
x
k
i
) is bounded and w.l.o.g. we can assume that it
converges to some g R
n
. Because of the previous inequality we have |g Ax
0
|.
On the other hand we obtain by the monotonicity of A
k
i
for all u U
A
k
i
x
k
i
A
k
i
u, x
k
i
u 0,
and hence using pointwise convergence
g Au, x
0
u 0
for all u U. By Theorem 5.7.6 below it follows that g = Ax
0
, a contradiction.
Theorem 5.7.6 (Browder and Minty (see [109])). Let E be a Banach space and U
an open subset of E. Let A : U E
be a hemi-continuous operator. If for a pair

u
0
U and v
0
E
and for all u U the inequality

Au v
0
, u u
0
0
holds, then v
0
= Au
0
.
An immediate consequence of the theorem of Browder and Minty is the following
characterization theorem for solutions of the equation Ax = 0, if A is a monotone
operator:
Theorem 5.7.7 (see [109]). Let E be a Banach space and U an open subset of E. Let
A : U E
a continuous and monotone operator. Then the following characteriza-

tion holds: Au
0
= 0 for u
0
U if and only if for all u U the following inequality
holds:
Au, u u
0
0.
Proof. The if-part follows from Theorem 5.7.6 for v
0
= 0. Let now Au
0
= 0 then,
from the monotonicity of A, we obtain
0 Au Au
0
, u u
0
= Au, u u
0
.
Remark 5.7.8. If U is convex then the above theorem directly implies that the set
S
A
:= x U [ Ax = 0
is convex.
For monotone operators we obtain the following existence theorem which is in a
way a stronger version of Lemma 5.7.2:
n
be convex and A : U R
n
be a continuous monotone
operator. If there exists a ball K(x
0
, r) U and r > 0 such that Ax, xx
0
> 0 for
all x S(x
0
, r). Then for the set of solutions S
A
of the non-linear equation Ax = 0
the following statement holds:
,= S
A
K(x
0
, r).
Proof. The rst part follows from Lemma 5.7.2. For the second part let , Rwith
> and let x S(x
0
, r). Then monotonicity of A yields
A((x x
0
) +x
0
) A((x x
0
) +x
0
), ( )(x x
0
) 0.
Let I be the intersection of U with the straight line passing through x and x
0
. Fromthe
above inequality it follows that g : I R with g() := A((x x
0
) +x
0
), x x
0
is an increasing function. In particular g(1) = Ax, x x

0
> 0. Suppose there is a
1 <
I such that A(
(x x
0
) +x
0
) = 0 then g(
) = 0, a contradiction.
We are now in the position to present a stronger version of Theorem 5.7.1 for se-
quences of monotone operators:
n
be open and convex and (A
k
: U R
n
)
kN
be
a sequence of continuous monotone operators that converges pointwise on U to a
continuous operator A : U R
n
with the Property (5.8): there exists a ball
K(x
0
, r) U, r > 0 such that Ax, x x
0
> 0 for all x on the sphere S(x
0
, r).
Then there exists k
0
N such that the set of the solutions of the equation A
k
x = 0
is non-empty for all k k
0
and contained in K(x
0
, r).
Furthermore, let x
k
x U [ A
k
x = 0, k k
0
then every point of accumula-
tion of (x
k
)
kN
is a solution of Ax = 0.
Property (5.8) is satised by various classes of operators, among them are deriva-
tives of convex functions.
Lemma 5.7.11. If a monotone operator A dened on R
n
has a convex potential f
with a bounded set of minimal solutions M(f, R
n
) (which, of course, coincides with
the set of solutions of Ax = 0), then A satises Property (5.8).
Proof. Apparently for each x
0
M(f, R
n
) there is a sphere S(x
0
, r) such that f(x)
f(x
0
) > 0 for all x S(x
0
, r). As A = f
t
the subgradient inequality yields
0 < f(x) f(x
0
) Ax, x x
0
on that sphere.
For monotone operators, in general, such a statement is not available, i.e. Prop-
erty (5.8) does not follow from the boundedness of the solutions of Ax = 0, as the
following example shows:
Example 5.7.12. Let A : R
2
R
2
be a linear operator that represents a

2
-rotation.
A is monotone as
Ax Ay, x y = A(x y), x y = 0,
but on any sphere around the origin we have Ax, x = 0. Obviously, x[ Ax = 0 =
0.
An important class of operators in this context are the Fejr contractions according
to
Denition 5.7.13. An operator P : R
n
R
n
is called Fejr contraction w.r.t. x
0
(see [11]) or strictly quasi-non-expansive (see [26]) if x
0
is a xed point of P and
there is an r > 0 such that |P(x) x
0
| < |x x
0
| for all x / K(x
0
, r).
Remark 5.7.14. The above denition of a Fejr contraction differs somewhat from
that given in [11].
Remark 5.7.15. It follows immediately from the denition that the set of xed points
of a Fejr contraction w.r.t. x
0
is bounded.
If P is a Fejr contraction w.r.t. x
0
then the operator A := I P has Property (5.8)
as the following lemma shows:
Lemma 5.7.16. Let P : R
n
R
n
be a Fejr contraction w.r.t. x
0
then for A := IP
we obtain
Ax, x x
0
> 0
for all x / K(x
0
, r).
Proof. Let x / K(x
0
, r), then we obtain
Ax, x x
0
= x x
0
(P(x) x
0
), x x
0
= |x x
0
|
2
P(x) x
0
, x x
0
|x x
0
|
2
|P(x) x
0
||x x
0
| > 0.
Remark 5.7.17. If P is also non-expansive on R
n
then A = I P is apparently
monotone and continuous.
It is easily seen that a projection P onto a bounded convex set is a non-expansive
Fejr contraction. It can be shown that the same is true for certain compositions of
projections (see [62]).
5.7.2 Stability for Wider Classes of Operators
A large class of operators can be treated using the above stability principle, where
pointwise convergence already implies continuous convergence. To illustrate this,
consider a continuous operator A : R
n
R
n
satisfying Property (5.8). Then it is
easily seen that in the following situations continuous convergence, and hence stability
of the solutions follows from Theorem 5.7.1:
(a) Let (
k
+
tending to 0, let P : R
n
R
n
be continuous,
and A
k
:=
k
P +A.
Proof. Let x
k
x
0
then the sequence (P(x
k
)) is bounded, hence
k
P(x
k
)
0 and A
k
(x
k
) A(x
0
) because of the continuity of A.
(b) Let A
k
: R
n
R
n
be continuous and let A
k
A component-wise mono-
tone, i.e. for A
k
(x) = (f
(i)
k
(x))
n
i=1
and A(x) = (f
(i)
(x))
n
i=1
one has pointwise
monotone convergence of f
(i)
k
f
(i)
for i = 1, . . . , n on R
n
.
Proof. This follows from the theorem of Dini (see 5.2.2) applied to the compo-
nents of A
k
and A respectively.
(c) Let A
k
: R
n
R
n
be continuous and A
k
Apointwise on R
n
and let A
k
A
be monotone for all k N.
Proof. We have A
k
A 0 pointwise on R
n
and from Theorem 5.7.10 con-
tinuous convergence follows.
(d) Compositions of continuously convergent sequences of functions preserves con-
tinuous convergence, i.e. let g
k
: R
n
R
n
be continuously convergent to g :
R
n
R
n
and let f
k
: R
n
R
n
be continuously convergent to f : R
n
R
n
then f
k
g
k
converges continuously to f g.
A special case is obtained if either (f
k
) or (g
k
) is a constant sequence of a
continuous function, e.g. let B : R
n
R
n
linear and let A
k
: R
n
R
n
be continuous and monotone, and let A
k
A pointwise on R
n
then B A
k
converges continuously to B A.
5.7.3 Two-stage Solutions
Two-stage solutions we have studied in Section 5.6 (see also [59], [65], [66]), in
particular for sequences of convex functions. The following theorem (see [59], p. 246)
gives a framework for sequences of non-linear equations, where the second stage is
described in terms of a variational inequality:
Theorem 5.7.18. Let X, Y be a normed spaces, (A : X Y ) continuous and for
the sequence of continuous operators (A
k
: X Y )
kN
let L = lim
k
x[ A
k
x =
0 x[ Ax = 0 =: S
A
. Let further (a
k
)
kN
be a sequence of positive numbers
and B : X X
a continuous mapping with B(0) = 0 such that

(a) B A : X X
is monotone
(b) a
k
(B A
k
B A) converges continuously to a mapping D : X X
.
Let x L then for all x S
A
the inequality D x, x x 0 holds.
Proof. Let x
k
x[ A
k
x = 0 such that (x
k
) converges to an x X, i.e. x L.
Let x S
A
, i.e. Ax = 0. Since B A monotone and continuous it follows that
a
k
(B A
k
B A)x
k
, x x
k
= a
k
B Ax
k
, x
k
x 0.
Since a
k
(BA
k
BA) converges continuously to D it follows that a
k
(BA
k
B
A)x
k
converges to D x in the norm, and hence inequality D x, x x 0 follows.
Example 5.7.19. If A
k
:=
k
P + A (compare class (a) of previous section) where
A is monotone, P a positive denite linear operator, and (
k
) a sequence of positive
numbers tending to 0 then, choosing a
k
:=
1
k
, D = P and the inequality P x, x
x 0 for all x S
A
is the characterization of a minimal solution of the strictly
convex functional x Px, x on the convex set S
A
(compare Remark 5.7.8). In
this case, convergence of (x
k
) to x follows.
Example 5.7.20. Let A, C : R
n
R
m
be linear operators, b R
m
and let S
A
:=
x R
n
[ Ax = b be non-empty. Let further (
k
+
tending to
zero and let A
k
:=
k
C +A. Then for a
k
:=
1
k
and B := A
T
it follows that
a
k
(BA
k
BA) =
1
k
(
k
A
T
C +A
T
AA
T
A) = A
T
C =: D,
and hence
A
T
C x, x x 0
for all x in the afne subspace S
A
, in other words: A
T
C x is orthogonal to kernel
of A.
Example 5.7.21 (see [62]). LP-problem: let c, x R
n
, A L(R
n
, R
m
), b R
m
,
then we consider the following problem:
minc, x [ Ax = b, x 0
with bounded and non-empty set of solutions. We choose the mapping P = P
m

P
m1
P
1
of the successive projections P
i
onto the hyperplanes
H
i
= s R
n
[ a
i
, s = b
i
for i 1, . . . , m,
where a
i
denotes the i-th row of A, and in addition to that the projection P
K
onto
the positive cone R
n
0
, given by
P
K
(x) := ((x
1
)
+
, . . . , (x
n
)
+
).
As P
K
is non-differentiable a smoothing of the projection P
K
is obtained by replacing
the function s (s)
+
by a smooth function
: R R ( > 0) that approximates

the ()
+
-function.
The projection P
K
is then replaced by P
= (
(x
1
), . . . ,
(x
n
)). By use of the
Newton method the non-linear equation
F
(x) := x P
P(x) +c = 0.
can be solved very efciently.
It can be shown that P
K
P is a non-expansive Fejr contraction w.r.t. any x
S := x R
n
[ Ax = b, x 0 and that P
P is non-expansive. Let (
k
) be a
positive sequence tending to 0. Stability then follows from Lemma 5.7.16 together
with Theorem 5.7.10 for the sequence of monotone operators A
k
= F
k
converging
pointwise to the monotone operator A = I P
K
P satisfying Property (5.8).
Application of Theorem 5.7.18 yields a condition for
k
that enforces continuous
convergence. We have for a
k
=
1
k
:
a
k
(A
k
A) = a
k
(P
k
P +
k
c +P
K
P) =
1
k
(P
K
P P
k
P) +c.
It follows that, if
1
k
(
k
()
+
) 0 uniformly on compact subsets of R, then
a
k
(A
k
A) converges continuously to c. If x is any limit point of the sequence of
solutions of the equations (A
k
x = 0) then for all x S we obtain c, x x 0.
Remark 5.7.22. Convex optimization problems with linear constraints can be treated
in the same manner: let f : R
n
R
n
be convex and differentiable, then consider
minf(x) [ Ax = b, x 0.
The (monotone) operator F
becomes
F
(x) := x P
P(x) +f
t
(x),
and one obtains f
t
( x), x x 0 for every point of accumulation x and all x S,
i.e. x is a minimal solution of f on S, according to the Characterization Theorem of
Convex Optimization 3.4.3.
Chapter 6
Orlicz Spaces
In this chapter we will develop the theory of Orlicz spaces equipped with the Lux-
emburg norm for arbitrary measure spaces and arbitrary Young functions. In the next
chapter we will then discuss dual spaces and the Orlicz norm that arises naturally in
this context.
At rst we will investigate the properties of (one-dimensional) Young functions and
their conjugates.
6.1 Young Functions
Denition 6.1.1. Let : R R
0
a lower semi-continuous, symmetric, convex
function with (0) = 0, where 0 is an interior point of Dom(). Then is called a
Young function.
Remark 6.1.2. A Young function is monotone on R
0
, because let 0 s
1
< s
2
, then
we have
(s
1
)
s
2
s
1
s
2
(0) +
s
1
s
2
(s
2
) (s
2
).
Let be a nite Young function and let s
0
:= sups R[ (s) = 0, then is
strictly monotonically increasing on [s
0
, ) Dom(), because let s
0
s
1
< s
2
,
then s
1
= s
2
+ (1 )s
0
with 0 < 1 and we obtain
(s
1
) (s
2
) + (1 )(s
0
) < (s
2
).
One of the goals of our consideration is, to anticipate many phenomena of the (typi-
cally innite dimensional) function spaces from the properties of the one-dimensional
Young functions. In particular we will see that the duality relations for norms and
modulars are connected to the duality (conjugate) of the corresponding Young func-
tions.
Subdifferential and One-sided Derivatives
Essentially the subdifferential of a Young function consists in the interval between
left-sided and right-sided derivative. Without any restriction this holds for the interior
points of the domain where is nite, denoted by Dom(). At the boundary of
Dom() the subdifferential can be empty, even though
and
+
exist in R.
176 Chapter 6 Orlicz Spaces
Example 6.1.3. Let
(s) =
_
1
1 s
2
for [s[ 1
otherwise
then: 1 Dom(),
t
(1) =
t
+
(1) = and (1) = holds.
Example 6.1.4. Let
(s) =
_
[s[ for [s[ 1
otherwise
then: 1 Dom(),
t
(1) = 1,
t
+
(1) = and (1) = [1, ) holds.
Example 6.1.5. Let
(s) =
_
tan [s[ for [s[ <

2
otherwise
then:

2
/ Dom() holds.
In the next section on the conjugate of a Young function we will prove the funda-
mental relation between the generalized inverses in connection with Youngs equality.
The complications mentioned above require a careful treatment in order to substan-
tiate the simple geometrical view described below.
The following theorem yields an extension of the theorem of MoreauPschenitsch-
nii 3.10.4 for Young functions:
Theorem 6.1.6. Let s
0
0 and let s
0
Dom(), then
t
(s
0
) and
t
+
(s
0
) exist in
R and we have
t
(s
0
)
t
+
(s
0
). Moreover:
(a) If beyond that (s
0
) ,= , then
t
(s
0
) is nite and the following inclusion
holds:
[
t
(s
0
),
t
+
(s
0
)) (s
0
) [
t
(s
0
),
t
+
(s
0
)].
(b) If, however, (s
0
) = , then
t
(s
0
) =
t
+
(s
0
) = .
(c) If, in particular s
0
Int(Dom()), then
t
+
(s
0
) is nite and we obtain for the
subdifferential at s
0
[
t
(s
0
),
t
+
(s
0
)] = (s
0
).
Proof. Let t 0 and 1, 1, then we dene
h
(t) := (s
0
+t) (s
0
).
Apparently h
is convex and for 0 < s t

h
(s) = h
_
s
t
t +
t s
t
0
_
s
t
h
(t) +
t s
t
h
(0) =
s
t
h
(t)
Section 6.1 Young Functions 177
holds. Therefore
(s
0
+s) (s
0
)
s

(s
0
+t) (s
0
)
t
.
In particular the difference quotients
(s
0
+t)(s
0
)
t
resp.
(s
0
t)(s
0
)
t
are monotoni-
cally decreasing resp. increasing for t 0.
Hence there exist
t
(s
0
)=lim
t0
(s
0
t)(s
0
)
t
and
t
+
(s
0
)=lim
t0
(s
0
+t)(s
0
)
t
in
R.
Due to
(s
0
) =
_
1
2
(s
0
t) +
1
2
(s
0
+t)
_
1
2
(s
0
t) +
1
2
(s
0
+t),
we have for t > 0
(s
0
t) (s
0
)
t

(s
0
+t) (s
0
)
t
,
and thus
t
(s
0
)
t
+
(s
0
).
(a) Let now (s
0
) ,= and (s
0
), then
(t) (s
0
t) (s
0
),
thus for t > 0 and t small enough (0 is an interior point of Dom())

(s
0
t) (s
0
)
t
R.
As the above difference quotient for t 0 is monotonically increasing, it follows that

t
(s
0
) R.
Furthermore
t (s
0
+t) (s
0
),
and hence

(s
0
+t) (s
0
)
t
,
thus
t
+
(s
0
).
Conversely let R and
t
(s
0
)
t
+
(s
0
), i.e. because of the monotonicity
of the difference quotient for t > 0
(s
0
t) (s
0
)
t

(s
0
+t) (s
0
)
t
,
hence t (s
0
+ t) (s
0
) and (t) (s
0
t) (s
0
), and therefore
(s
0
).
(b) If (s
0
) = , then
t
(s
0
) = must hold, because suppose
t
(s
0
) < ,
then, due to the result obtained above
t
(s
0
) (s
0
).
(c) Let now s
0
Int(Dom()), then by Theorem 3.10.2 (s
0
) ,= . In particular
by (a).
t
(s
0
) nite. Furthermore
(s
0
+t)(s
0
)
t
R for t > 0 and sufciently small,
and hence due to the monotonicity of the difference quotient
t
+
(s
0
) R. We obtain
(s
0
t) (s
0
)
t

t
(s
0
)
t
+
(s
0
)
(s
0
+t) (s
0
)
t
,
and thus
t
+
(s
0
) (s
0
).
Remark 6.1.7. For s
0
< 0 analogous assertions hold. If in particular s
0

Int(Dom()), then also
t
+
(s
0
) is nite and
[
t
(s
0
),
t
+
(s
0
)] = (s
0
)
holds.
Theorem 6.1.8. Let 0 u < v Dom(), then:
t
+
(u)
t
(v) holds.
Proof. We have u Int(Dom()) and hence
t
+
(u) is nite by the previous theorem.
If
t
(v) is nite, then we also have by the previous theorem

t
(v) (v). Using

the monotonicity of the subdifferential (see Theorem 3.10.3) we then obtain
(
(v)
t
+
(u)) (v u) 0,
and since v u > 0 the assertion. If
t
(v) = , then there is nothing left to

show.
The Conjugate of a Young Function
For the conjugate of a Young function one obtains according to Denition 3.11.1
(s) = sups t (t) [ t R.
Apparently is again a Young function, because is, being the supremum of afne
functions (see Remark 3.5.2) lower semi-continuous. Fromthe denition immediately
(as in the general case) Youngs inequality
(t) +(s) t s
follows.
From the theorem of FenchelMoreau 3.11.7 we obtain for =
= .
Thus and are mutual conjugates. For Youngs equality we obtain the following
equivalences (see Theorem 3.11.9):
Theorem 6.1.9. The following assertions are equivalent:
(a) s (t)
(b) t (s)
(c) (t) +(s) = s t.
As a consequence we obtain the following
Lemma 6.1.10. Let be a nite Young function. Then: (
t
+
(t)) < holds for all
t R. In particular (
t
+
(0)) = 0 holds. A corresponding statement holds for the
left-sided derivative
t
.
Proof. The fact that is nite implies by Theorem 6.1.6 and Remark 6.1.7 that
t
+
(t) (t) and hence
(t) +(
t
+
(t)) =
t
+
(t) t.
The same proof holds for the left-sided derivative.
We will now demonstrate a geometrical interpretation of the right-sided derivative
of as a generalized inverse of the right-sided derivative of using the relationship
of the subdifferential with the one-sided derivatives shown in the previous section
(compare Krasnosielskii [72]) for N-functions).
Lemma 6.1.11. Let t (
t
(s),
t
+
(s)), then (t) consists only of a single point.
Proof. By Theorem 6.1.9 s (t). Suppose there is s
1
(t) with s
1
,= s, then,
again by Theorem 6.1.9 t (s). Therefore according to Theorem 6.1.8
s
1
> s
t
(s
1
)
t
+
(s) > t
s
1
< s
t
+
(s
1
)
t
(s) < t
i.e. t / (s
1
) a contradiction.
Notation 6.1.12.
(t) :=
_
t
+
(t) for t Dom()
otherwise
(s) :=
_
t
+
(s) for s Dom()
otherwise.
Lemma 6.1.13. is increasing and right-sided continuous on D := R
+
Dom.
Proof. Let 0 t
1
< t
2
Dom. Then
(t
1
) =
t
+
(t
1
)
t
(t
2
)
t
+
(t
2
) = (t
2
).
Let t
0
D and (t
n
) a sequence in D such that t
n
t
0
, then
t
+
(t
n
) = (t
n
) s.
Furthermore
t
+
(t
n
)
t
+
(t
0
) for all n N and hence s
t
+
(t
0
). On the other
hand we have according to Theorem 6.1.9
(t
n
) +((t
n
)) = t
n
(t
n
)
and hence
(t
0
) +( s)) = t
0
s.
Again by Theorem 6.1.9 we have s (t
0
) and hence due to Theorem 6.1.6 s
t
+
(t
0
).
Remark 6.1.14. A corresponding statement holds for the left-sided derivative
t
is increasing and left-sided continuous on D := R

+
Dom
t
.
The duality relation between a Young function and its conjugate can now be de-
scribed in the following way: the right-sided derivative of the conjugate is the gener-
alized inverse of the right-sided derivative of the Young function (for N-functions see
Krasnosielskii, [72]). This is substantiated in the following theorem:
Theorem 6.1.15. Let s 0, then
(s) = sup [ () s
holds.
Proof. Case 1: Let the interior of (s) non-empty and let t be in the interior, then
by Lemma 6.1.11: s =
t
(t). At rst we show the following inclusion:
(
t
(s),
t
+
(s)) [
t
() = s (s).
Let u [
t
() = s, then apparently s (u) and by Theorem 6.1.9 u
(s). Let on the other hand u (
t
(s),
t
+
(s)), then s =
t
(u) by Lemma 6.1.11
and hence u [
t
() = s. According to Theorem 6.1.6
t
+
(s) = sup(s) = sup [
t
() = s
holds.
Case 2: Let (s) consist only of a single point and let t =
t
(s) then s (t)
by Theorem 6.1.9. Furthermore t 0, because if s = 0, then t =
t
(s) = 0, if
s > 0, then t =
t
(s)
(s)
s
0.
If (t) also consists of a single point, then t =
t
(
t
(t)).
If the interior of (t) is non-empty let s (
t
(t),
t
+
(t)), then we obtain for
u > t by Theorem 6.1.8:
t
( u)
t
+
(t) provided that ( u) ,= , i.e. u /
u[
t
+
(u) s. For u < t we obtain
t
+
(t) > s >
t
(t)
t
+
(u),
i.e. u [
t
+
() s for all u < t. Therefore
t
(s) = t = supu[
t
+
(u) s.
Let now s =
t
+
(t), then
t
+
(t) (t), i.e.
t
(
t
+
(t)) = t by Theorem 6.1.9.
Let nally s =
t
(t), then as above

t
(t) (t),
t
(s) =
t
(
t
(t)) = t = supu[
t
+
(u) s,
since s =
t
(t)
t
+
(u) for all u < t by Theorem 6.1.8.
Case 3: (s) = : i.e. s > 0. Then
t
+
(u) s for all u R
0
. For suppose there
is u
1
R
0
with
t
+
(u
1
) := s
1
> s then u
1
(s
1
) and u
1

t
(s
1
)
t
+
(s)
hence
t
+
(s) nite, a contradiction. Therefore
supu[
t
+
(u) s = .
Corollary 6.1.16. Let be continuous, strictly monotone, and let lim
s
(s) = ,
then for s 0
(s) =
1
(s)
holds.
Proof. According to our assumption there is a unique 0 with () = s, hence by
Theorem 6.1.15 (s) = =
1
(s).
Theorem 6.1.17. If lim
t
(t) = , then is nite. If beyond that is nite, then
lim
s
(s) = holds.
Proof. Let s 0 be arbitrary, then there is t
s
R with (t
s
) > s and hence by
denition: (s) < t
s
.
Let now be nite and t
n
, hence (t
n
) . Let (s
n
) be chosen, such that
s
n
(t
n
), then by denition t
n
(s
n
) .
The relationship between the right-sided derivatives of and established above
we will now employ to substantiate the duality of differentiability and strict convexity.
Denition 6.1.18. is called continuous in the extended sense, if is continuous
for t < t
0
= supr [ (r) < and lim
tt
0
(t) = .
is called (continuously) differentiable in the extended sense, if continuous in
the extended sense, i.e. if is continuous for t < t
0
and lim
tt
0
(t) = .
Remark 6.1.19. For every Young function we have lim
t
(t) = , thus all
continuous (i.e. nite) Young functions are also continuous in the extended sense. The
continuity of in the extended sense is, however, a stronger requirement, in particular
is continuous in the extended sense, if R (R).
Theorem 6.1.20. Let be nite. is strictly monotone, if and only if is continuous
in the extended sense.
Proof. Let at rst be strictly monotone and for s 0
(s) = supt R[ (t) s.
Let s (R), then there is a unique t
s
R
+
with (t
s
) = s. Then apparently
(s) = t
s
.
Let now U := u R[ (u) s and V := v R[ (v) > s.
(a) V ,= : apparently: (s) = sup U = inf V =: t
s
holds. Let (v
n
) be a sequence
in V with v
n
t
s
, then due to the right-sided continuity of (see Lemma 6.1.13)
r
n
:= (v
n
) (t
s
) =: s s,
hence in particular ( s) = t
s
and furthermore
v
n
= (r
n
) t
s
= ( s).
Let on the other hand (u
n
) be a sequence in U with u
n
t
s
, then we obtain
s
n
:= (u
n
) (t
s
) =: s s,
and therefore
u
n
= (s
n
) t
s
= ( s) (s).
The monotonicity of implies
( s) (s) (s) ( s).
If now (t
s
) = (t
s
), i.e. s = s, then is continuous at s, if (t
s
) < (t
s
), then
is constant on [s, s] and due to the results of our previous discussion continuous
at the boundary points of the interval.
(b) V = : apparently then is bounded. Let s
:= sup(t) [ t R and let (u

n
)
be a strictly monotone sequence with s
n
:= (u
n
) s
. Suppose (u
n
) is bounded,
then u
n
u
0
R. Since (u
n
) < (u
0
) we then obtain s
(u
0
) s
,
hence constant on [u
0
, ], a contradiction to the strict monotonicity of . Therefore
(s
n
) = u
n
.
If is unbounded, then Theorem 6.1.17 implies lim
s
(s) = .
Conversely let be continuous in the extended sense. Let t 0, then s [ (s) =
t ,= and bounded. We have (t) = sups [ (s) t and hence (t) = s
t
=
sups [ (s) = t. Due to the continuity of we then have (s
t
) = t. Let now
0 t
1
< t
2
, then (s
t
1
) = t
1
< t
2
= (s
t
2
). Since is monotonically increasing,
we obtain (t
1
) = s
t
1
< s
t
2
= (t
2
).
Theorem 6.1.21. Let s Int(Dom()) and s 0, then
(s) =
_
s
0
t
+
(t)dt =
_
s
0
(t)dt
holds.
Proof. By Theorem 6.1.13 and 6.1.8 the right-sided derivative
t
+
is monotonically
increasing and bounded on [0, s] and hence Riemann integrable. Let now 0 = t
0
<
t
1
< < t
n
= s be a partition of the interval [0, s], then by Theorem 6.1.6 and the
subgradient inequality
(t
k1
)
t
+
(t
k1
)
(t
k
) (t
k1
)
t
k
t
k1

t
(t
k
)
t
+
(t
k
).
Furthermore
(s) (0) =
n
k=1
((t
k
) (t
k1
)),
and hence
n
k=1
t
+
(t
k1
)(t
k
t
k1
) (s) (0)
n
k=1
t
+
(t
k
)(t
k
t
k1
).
Since the two Riemann sums converge to the integral, the assertion follows with
(0) = 0 (and in an analogous manner for the second integral).
If we admit (if necessary) the value , then we obtain the representation of a Young
function and its conjugate as integrals of the generalized inverses
(s) =
_
s
0
(t)dt
resp.
(s) =
_
s
0
(t)dt.
We are now in the position to state the duality relation announced earlier:
Theorem 6.1.22. Let and be nite, then is strictly convex, if and only if is
(continuously) differentiable.
Theorem 6.1.23. is nite, if and only if
(t)
t
.
Proof. Let s 0.
If
(t)
t
M for all t, then (s) = supt (s
(t)
t
) = for s > M holds.
Conversely let
(t)
t
, then there is a t
s
, such that s
(t)
t
is negative for t > t
s
,
i.e.
(s) = supts (t) [ 0 t t
s
s t
s
.
Theorem 6.1.24. is nite, if and only if (t) .
Proof. Due to the subgradient inequality we have (t)(0 t) (0) (t), we
obtain for t > 0:
(t)
t
(t). If nite, then by Theorem 6.1.23 (t) .
Conversely let (t) , then by Theorem 6.1.17 is nite and using the above
integral representation the assertion follows.
Remark 6.1.25. Let t
0
:= supt [ (t) = 0, then is strictly monotonically in-
creasing on [t
0
, ), because let t
0
t
1
< t
2
, then there is a (0, 1] with
t
1
= t
0
+ (1 )t
2
and hence (t
1
) (t
0
) + (1 )(t
2
) < (t
2
).
Lemma 6.1.26. Let be nite, then the function s s
1
(
1
s
) is monotonically
increasing on (0, ). If in addition is nite, then:

1
(n)
n

n
0.
Proof. The rst part can be seen as follows: for t > t
0
0 let (t) > 0 and
(t
0
) = 0. Apparently, for each s > 0 there is a unique t > 0 such that s =
1
(t)
. Put
s(t) :=
1
(t)
and let t
0
< t
1
t
2
, then s(t
1
) s(t
2
) and, due to the monotonicity of
the difference quotient
(t)
t
at 0
s(t
1
)
1
_
1
s(t
1
)
_
=
t
1
(t
1
)

t
2
(t
2
)
= s(t
2
)
1
_
1
s(t
2
)
_
.
We now discuss part 2: let s
n
:=
1
(n), then due to the niteness of we have
s
n
(s
n
)
0.
Stability of the Conjugates of Young Functions
n
)
nN
be a sequence of Young functions, converging point-
wise and monotonically to the Young function . In the case of a monotonically in-
creasing sequence let in addition be nite. Then the sequence of the Young functions
n
converges pointwise and monotonically to , increasing if (
n
) is decreasing, de-
creasing, if (
n
) is increasing.
Section 6.2 Modular and Luxemburg Norm 185
Proof. By denition we have for the conjugates
n
(t) = supst
n
(s) [ s 0 = inf
n
(s) st [ s 0.
Let t 0 be chosen. By Theorem 5.2.1 the sequence (
n
) with
n
(s) :=
n
(s) st
converges lower semi-continuously to with (s) := (s)st. If the sequence (
n
)
is decreasing, then also the sequence (
n
) and we obtain according to the Stability
Theorem of Monotone Convergence 5.2.3
n
(t) = inf
n
(s) [ s 0 inf(s) [ s 0 = (t).
If the sequence (
n
) is monotonically increasing, then from lower semi-continuous
convergence (see Theorem 5.1.6) the Kuratowski convergence of the epigraphs
Epi(
n
) Epi() follows. If now is nite, then for t > 0 we obtain by Youngs
equality M(, R) = (t). By Theorem 6.1.6 we have (t) = [
t
(t),
t
+
(t)]. By
Theorem 6.1.6 we have
t
+
(t) < , also
t
(t) 0 holds, since

t
(t)
(t)
t
0.
If t = 0, then M(, R) = s R[ (s) = 0. Thus in any case M(, R) is non-
empty and bounded. From the Stability Theorem on the Epigraph Convergence 5.3.27
we obtain by denition of the conjugates
n
(t) = inf
n
(s) [ s 0 inf(s) [ s 0 = (t).
6.2 Modular and Luxemburg Norm
Let (T, , ) be an arbitrary measure space. On the vector space of -measurable real-
valued functions on T we introduce the following relation: two functions are called
equivalent if they differ only on a set of -measure zero. Let E be the vector space of
the equivalence classes (quotient space). Let further : R R be a Young function.
The set
L
() :=
_
x E
there exists > 0 with

_
T
(x)d <
_
is called Orlicz space. Apparently L
() is a subspace of E, because due to the

convexity of the arithmetic mean of two elements is still in L
(). This is also

true for an arbitrary multiple of an element.
Frequently we will consider the special case of the sequence spaces: let T = N,
the power set of N and (t
i
) = 1 for t
i
T arbitrary, then we denote L
() by
.
Theorem 6.2.1. The Minkowski functional
x inf
_
c > 0
_
T
_
x
c
_
d 1
_
denes a norm on L
(). It is called Luxemburg norm, denoted by | |

()
.
Proof. Let
K :=
_
x L
()
_
T
(x)d 1
_
.
According to Theorem 3.2.7 we have to show that K is a convex, symmetric lin-
early bounded set with the origin as algebraically interior point. It is straightfor-
ward to establish that the set K is convex and symmetric. Furthermore K has 0 as
an algebraically interior point. In order to see this, let x L
and > 0 such

that
_
T
(x)d < , then we dene the function h : [0, 1] R by h() :=
_
T
(x)d. Since h is convex and h(0) = 0 we obtain
h() h(1) < .
Hence there exists a
0
> 0 with h() 1 for
0
.
We nowdiscuss the linear boundedness: let y L
0, then there is M , such

that (M) > 0 and > 0 with [y(t)[ for all t M. Since lim
s
(s) = ,
we obtain
_
T
(y)d (M)()

.
Remark 6.2.2. The Luxemburg norm is apparently monotone, i.e. for x, y L
with
0 x y it follows that |x|
()
|y|
()
, because for > 0 we have
1
_
T
_
y
|y|
()
+
_
d
_
T
_
x
|y|
()
+
_
d,
hence |x|
()
|y|
()
+.
In order to establish the completeness of Orlicz spaces the following lemma turns
out to be helpful.
Lemma 6.2.3. Let 0 x
n
x almost everywhere and x
n
L
() for all n N.
Then either x L
() and |x
n
|
()
|x|
()
or |x
n
|
()
.
Proof. Let > 0 and := sup|x
n
|
()
N < . Then for all n N we have
_
T
_
x
n
+
_
d 1.
Apparently the sequence (
x
n
+
) converges pointwise a.e. to (
x
+
). This is obvious,
if is continuous in the extended sense (see Denition 6.1.18 and Remark 6.1.19).
If (s) is nite for [s[ a and innite for [s[ > a and if (
x(t)
+
) = then
x
n
(t)
+

x(t)
+
> a.
By the theorem of monotone convergence of integration theory we obtain
1 sup
n
_
T
_
x
n
+
_
d =
_
T
_
x
+
_
d,
i.e. x L
() and + |x|
()
and thus |x|
()
.
On the other hand let 0 <
1
< . Then due to the monotonicity of the sequence
(|x
n
|
()
) there is n
0
N, such that for n > n
0
we have: |x
n
|
()
>
1
, i.e.
_
T
_
x
n
1
_
d > 1,
and by the monotonicity of the integral
_
T
_
x
1
_
d > 1,
i.e. |x|
()

1
.
Theorem 6.2.4. The Orlicz space L
() is a Banach space.
Proof. Let (x
n
)
n
be a Cauchy sequence in L
(), i.e. lim

m,n
|x
n
x
m
|
()
= 0.
Then there exists a subsequence (y
k
)
k
of (x
n
)
n
such that
k=1
|y
k+1
y
k
|
()
< .
Let z
n
:= [y
1
[ +
n
k=1
[y
k+1
y
k
[, then apparently z
n
L
(). By the above lemma

the sequence (z
n
)
n
converges a.e. to a function z L
(). Hence also
k=1
[y
k+1
y
k
[ is convergent a.e. and thus also
k=1
(y
k+1
y
k
). Let nowy := y
1
+
k=1
(y
k+1
y
k
), then we have
y y
n
=
k=1
(y
k+1
y
k
) +y
1
y
n
=
k=1
(y
k+1
y
k
)
n1
k=1
(y
k+1
y
k
)
=
k=n
(y
k+1
y
k
),
and thus
|y y
n
|
()

k=n
|y
k+1
y
k
|
()
n
0.
Since (x
n
)
n
is a Cauchy sequence, the whole sequence converges to y w.r.t. the Lux-
emburg norm.
Closely related to the Luxemburg norm is the modular f
, which we will employ

frequently below.
Denition 6.2.5. The functional f
: L
() R with
f
(x) =
_
T
(x)d
we call modular.
6.2.1 Examples of Orlicz Spaces
In particular we obtain by the choice
: R R dened by
(s) :=
_
0 for [s[ 1
otherwise
and the corresponding Luxemburg norm
|x|
= inf
_
c > 0
_
T
_
x
c
_
d 1
_
the space L
().
For the Young functions
p
: R R with
p
(s) := [s[
p
one obtains the L
p
()-
spaces with the well-known notation | |
p
for the corresponding norms.
In the sequel we need a direct consequence of the closed graph theorem, which is
known as the two-norm theorem.
Theorem 6.2.6 (Two-norm theorem). Let X be a Banach space w.r.t. the norms | |
a
and | |
b
and let
| |
a
c | |
b
.
Then the two norms are equivalent.
Proof. We consider the identical mapping
id : (X, | |
a
) (X, | |
b
).
Let now (x
n
) be a sequence in X with x
n
x w.r.t. | |
a
and x
n
y w.r.t. | |
b
.
Then x
n
y w.r.t. | |
a
immediately follows and hence y = x = id(x). By the
closed graph theorem the identity is continuous and hence bounded on the unit ball,
i.e.
_
_
_
_
x
|x|
a
_
_
_
_
b
D,
therefore | |
b
D | |
a
.
Lemma 6.2.7. Let
1
and
2
be Young functions with
2

1
, then
(a) L
1
() L
2
()
(b) | |
(
2
)
| |
(
1
)
on L
1
()
(c)
1

2
(d) L
2
() L
1
()
(e) | |
(
2
)
| |
(
1
)
on L
2
().
Proof. Let x L
1
(), then for c > |x|
(
1
)
we obtain
1
_
T
1
_
x
c
_
d
_
T
2
_
x
c
_
d.
The remaining part follows from
1
(r) = suprs
1
(s) suprs
2
(s) =
2
(r).
Denition 6.2.8. The Young function is called denite, if (s) > 0 for s > 0,
otherwise indenite.
Lemma 6.2.9. Let not be nite and let a > 0, such that is on (a, ) and
nite on [0, a). Then we have for the Young function
1
with
1
(s) = a[s[ the
relation
1
, where is the conjugate function of .
Let now
(a) be denite then there are positive numbers , s
0
and a Young function
2
with
2
(s) =
_
0 for 0 [s[ s
0
([s[ s
0
) for [s[ > s
0
such that
2
.
(b) indenite, then there is a b > 0 and a Young function
2
with
2
(s) = b[s[
and
2
.
Proof. For s 0 we have
(s) = sup
u0
su (u) = sup
0ua
su (u) sup
0ua
su = as =
1
(s).
(a) Let be denite. Then
t
+
> 0 on (0, a). Let u
0
(0, a), then 0 <
t
+
(u
0
) <
by Theorem 6.1.6. Choose s
0
=
t
+
(u
0
), then by Theorem 6.1.15
t
+
(s
0
) = supu[
t
+
(u) s
0
= u
0
.
Concerning the construction of
2
: Let :=
t
+
(s
0
). Then for
2
(s) :=
_
0 for 0 [s[ s
0
([s[ s
0
) for [s[ > s
0
with the right-sided derivative
2
(s) =
_
0 for 0 s < s
0
for s s
0
and hence for s 0:
2
(s)
t
+
(s), therefore
2
(s) (s).
(b) Let be indenite, i.e. there is a s
1
> 0 with (s) = 0 for all s [0, s
1
]. Then
for s 0
(s) = sup
u
su (u) sup
0us
1
su (u) = s
1
s.
If we put b := s
1
, we obtain the assertion.
Theorem 6.2.10. Let be not nite and for innite measure let in addition be
indenite, then we have
(a) L
() = L
1
()
(b) | |
()
is equivalent to | |
1
(c) L
() = L
()
(d) | |
()
.
Proof. Let a > 0, such that is equal to on (a, ) and nite on [0, a). By
Lemma 6.2.9 we have for the Young function
1
with
1
(s) = a[s[ the relation

1
. Apparently L
1
() = L
1
() and | |
(
1
)
= a| |
1
. By Lemma 6.2.7 we
obtain L
1
() L
() and | |
()
a| |
1
.
Let x L
() and c > 0, such that

_
T
(
x
c
)d 1. Suppose x / L
(), then
there is a number r > a and a set A with (A) > 0 and [
x(t)
c
[ r for all t A.
Then (
x(t)
c
) = on all of A, a contradiction. Hence we have at rst: L
.
Let now:
i) be indenite, i.e. there is a s
0
> 0 with (s) = 0 for all s [0, s
0
]. Let now
x L
(), then we have [

x
|x|
s
0
[ s
0
almost everywhere and hence
_
T
_
x
|x|
s
0
_
d = 0.
Therefore: |x|
()

1
s
0
|x|
and thus L
() and L
() are equal as sets. The

equivalence of the norms follows by the two-norm theorem.
By Lemma 6.2.9 there is a b > 0 with b| |
1
| |
()
. Then L
1
() = L
() and
the norms are equivalent.
ii) denite and (T) < . Then by Lemma 6.2.9 there is a Young function
2
with
2
(s) =
_
0 for 0 [s[ s
0
([s[ s
0
) for [s[ > s
0
and
2
, hence L
2
. Let now x L
2
(). We have to show: x L
1
():
Let s
0
> 0 and let T
0
:= t T [
[x(t)[
|x|
(
2
)
+
s
0
. Then we have
1
_
T
2
_
x
|x|
(
2
)
+
_
d =
_
T\T
0
x
|x|
(
2
)
+
s
0
_
d
=

|x|
(
2
)
+
_
T\T
0
[x[d
s
0
|x|
(
2
)
+
(T T
0
).
Therefore
|x|
(
2
)
+
+s
0
(T T
0
)
_
T\T
0
[x[d,
put c := |x|
2
+ and due to
_
T
0
[x[d (T
0
) s
0
c we obtain
|x|
1

c
+s
0
(T)(1 +c).
Hence x L
1
(), i.e. in all L
1
() = L
(). The equivalence of the norms | |

1
and
| |
()
follows by the two-norm theorem.
Let (s) be nite for all s [0, s
1
] and let x L
(). Then [
x
|x|
[ a.e. for
arbitrary positive . has a (nite) inverse
1
on [0, (s
1
)] due to the deniteness
of . Let now either
i.
1
(T)
(s
1
), then there exists :=
1
(
1
(T)
)
ii. or
1
(T)
> (s
1
) then put := s
1
.
Thus we obtain
_
T
_
x
|x|
_
d
_
T
()d
_
T
1
(T)
d = 1.
Therefore: L
() = L
() and |x|
()

1
|x|
.
The equivalence of the norms follows again by the two-norm theorem.
Corollary 6.2.11. Let be not nite, let T
0
be a subset with nite measure, and
let (T
0
,
0
,
0
) be an induced measure space on T
0
, then L
() contains a closed
subspace, isomorphic to L
(
0
).
Proof. By Theorem 6.2.10 items (c) and (d) we have L
(
0
) = L
(
0
) as sets and
the corresponding norms are equivalent.
If is denite, then the assertion of Theorem 6.2.10 for innite measure does not
hold in general, as can be seen by the following example of sequence spaces.
Example 6.2.12. Let
(s) =
_
0 for 0 [s[ 1
([s[ 1) for [s[ > 1
then for =
we obtain
(s) =
_
[s[ for 0 [s[ 1
for [s[ > 1.
First of all we nd:
and
are equal as sets: if x is bounded, then f
(
x
|x|
) =
0 < , i.e. x
. If on the other hand y unbounded,

y
c
is also unbounded for
arbitrary c > 0 and there is a subsequence of components (
y
i
k
c
), whose absolute value
tends to innity and hence
f
_
y
c
_
k=k
0
_
y
i
k
c
1
_
=
for c > 0 arbitrary, i.e. y /
. Let now x
, then f
(
x
|x|
) = 0 < 1, i.e.
|x|
|x|
()
. By the two-norm theorem both norms are equivalent.
We now consider
and show at rst:

1
and
are equal as sets: let x

1
, then
x is in particular bounded and
f
_
x
|x|
_
=
i=1
x
i
|x|
=
|x|
1
|x|
< ,
i.e. x
.
Conversely, let x
, then there is a c > 0, such that f
(
x
c
) < . Apparently x
is bounded and c |x|
must hold, hence

f
_
x
c
_
=
i=1
x
i
c
=
1
c
i=1
[x
i
[ < ,
i.e. x
1
. Let now x
. Since |x|
1
|x|
, we have
f
_
x
|x|
1
_
=
i=1
x
i
|x|
1
= 1
and hence |x|
()
|x|
1
. In order to show equality, we distinguish two cases
(a) |x|
1
= |x|
: in this case only a single component is different from zero. Let

c < |x|
1
, then
f
_
x
c
_
=
_
|x|
c
_
=
and hence (due to the left-sided continuity of on R
+
): |x|
()
> c.
(b) |x|
1
> |x|
: let |x|
1
> c > |x|
, then
f
_
x
c
_
=
i=1
x
i
c
=
1
c
|x|
1
> 1
and hence c < |x|
()
.
In both cases c < |x|
1
implies c < |x|
()
, altogether |x|
1
= |x|
()
.
Example 6.2.13. Let 1 1.
Then
=
p
(unit ball identical), hence
,=
. Moreover, with
1
p
+
1
q
= 1
(s) =
_
c
p
[s[
q
for 0 [s[ p
[s[ 1 for [s[ > 1
with c
p
= (
1
p
)
q
p
(
1
p
)
q
= (
1
p
)
q
p
1
q
. Apparently
,=
1
.
6.2.2 Structure of Orlicz Spaces
Denition 6.2.14. In the sequel we denote by M
() the closed subspace of L
()
spanned by the step functions in L
() with nite support, i.e.

_
n
i=1
a
i
A
i
(A
i
) < , a
i
R, i = 1, . . . , n
_
.
The domain of nite values of the modular f
() := x L
() [ f
(x) <
is introduced as the Orlicz class. Furthermore we dene
H
() := x L
() [ f
(k x) < for all k N.

Since
f
(k(x +y)) = f
_
1
2
(2kx + 2ky)
_
1
2
f
(2kx) +
1
2
f
(2ky) <
for x, y H
() the set H
() is a subspace.
For future use we need the following
Lemma 6.2.15. Let x H
() and x ,= 0, then
f
_
x
|x|
()
_
= 1
holds. If on the other hand f
(x) = 1, then |x|

()
= 1.
Proof. The mapping h : R R with f
(x) is apparently convex and hence

continuous (see Theorem 5.3.11). Let 0 < < |x|
()
and
1
:= 1/(|x|
()
+ ),
then f
(
1
x) 1. Moreover we obtain with
2
:= 1/(|x|
()
) the inequality
f
(
2
x) > 1. By the intermediate value theorem there is a [
1
,
2
], such that
f
(x) = 1. Let
0
:= sup[ h() = 0. Then h is strictly monotone on [
0
, ),
because let
0

1
<
2
, then
1
= s
0
+ (1 s)
2
for a s (0, 1) and hence
h(
1
) s h(
0
) + (1 s)h(
2
) < h(
2
).
Remark 6.2.16. (a) Apparently we have: H
() C
().
(b) The Orlicz class is a convex set, because for x
1
, x
2
C
() we obtain for 0
1
f
(x
1
+ (1 )x
2
) f
(x
1
) + (1 )f
(x
2
) < .
Theorem 6.2.17. If is nite, then M
() C
() holds.
Proof. Let x M
() and y a step function with nite support, for which

2|x y|
()
+ 1
holds ( > 0 appropriately chosen). Then y is in C
() and hence also 2y being a

step function in C
(). Due to the convexity of f
we obtain
1
2|x y|
()
+
_
T
(2(x y))d
_
T
_
2(x y)
2|x y|
()
+
_
d 1.
Therefore
_
T
(2(x y))d 2|x y|
()
+,
i.e. 2(x y) C
(). Since C
() is convex, we obtain
x =
1
2
2y +
1
2
(2(x y)) C
().
The subsequent theorem asserts that for nite measure and nite Young function
the bounded functions are always contained in the closure of the step functions.
Theorem 6.2.18. Let (T) < and nite, then L
is a linear subspace of M
and L
= M
holds, where the closure is understood w.r.t. | |

()
.
Proof. Let x L
bounded, a x b a.e. for a, b R, and > 0 be given. Let

further :=
1
(
1
(T)
) and n
0
the smallest natural number, such that
ba
n
0
and
let
T
k
:= t T [ a + (k 1) x(t) < a +k
for k = 1, . . . , n
0
, then for the step function y :=
n
0
k=1
(a + (k 1))
T
k
we have
f
_
x y
_
f
_

_
_
(T) = 1,
and hence |x y|
()
. On the other hand the step functions are, of course,
contained in L
.
By a similar construction as in the above theorem one can substantiate that for nite
measure the step functions are dense in L
() w.r.t. the | |
-norm.
Theorem 6.2.19. Let (T) < , then: L
() = M
() holds.
For innite measure this statement is false in general, as the example of the se-
quence spaces shows:
Example 6.2.20. The step functions are not dense in
, as can be seen from the

sequence x = (x
i
)
iN
with x
i
= 1 for i N. The space m
consists of the sequences

tending to zero. In the literature this space is denoted by c
0
.
From convergence in the Luxemburg norm always follows convergence in the mod-
ular.
Theorem 6.2.21. From |x
n
x
0
|
()
0 always f
(x
n
x
0
) 0 follows.
Proof. Let 0 < 1, then
f
(x) = f
+ (1 )0
_
f
_
x
_
.
Let now 1 > > 0, then
_
T
(
x
n
x
0
|x
n
x
0
|
()
+
)d 1 and thus for n sufciently large
and := |x
n
x
0
|
()
+
f
(x
n
x
0
) (|x
n
x
0
|
()
+)
_
T
_
x
n
x
0
|x
n
x
0
|
()
+
_
d
(|x
n
x
0
|
()
+).
If satises the
2
-condition (see Denition 6.2.25), then, as we will see, also the
converse of this theorem holds.
From the previous theorem the closedness of H
() immediately follows
Theorem 6.2.22. H
() is a closed subspace of L
().
Proof. Let x
0
L
() and let (x
n
) be a sequence in H
(), converging to x
0
. From
x
n
x
0
it follows for k Narbitrary 2kx
n
2kx
0
, and hence f
(2kx
n
2kx
0
)
0, i.e. 2kx
n
2kx
0
C
() for n sufciently large. We then obtain

f
(kx
0
) = f
_
1
2
2kx
n

1
2
(2kx
n
2kx
0
)
_
1
2
f
(2kx
n
) +
1
2
f
(2kx
n
2kx
0
) < .
Remark 6.2.23. Apparently for nite the step functions with nite support are
contained in H
(). Since H
() is closed, we conclude M
() H
(). We will
now establish that for nite and a -nite measure space already M
() = H
()
holds:
Theorem 6.2.24. Let (T, , ) be a -nite measure space and let be nite, then
M
() = H
() holds.
Proof. Let T =
j=1
B
j
with (B
j
) < . Let further B
n
:=
n
j=1
B
j
, let x
H
(), and let

x
n
(t) :=
_
x(t) for [x(t)[ n and t B
n
0 otherwise.
then x
n
L
(), because [x
n
(t)[ [x(t)[ i.e. |x
n
|
()
|x|
()
. We will now
approximate x
n
by a sequence of step functions. Let
nk
:=
n
k
for k N and let for
r Z with 0 r k
C
nkr
:= t B
n
[ r
nk
x
n
(t) < (r + 1)
nk
,
and for k r < 0:
C
nkr
:= t B
n
[ (r 1)
nk
< x
n
(t) r
nk
.
Then we dene for r = k, . . . , k
x
nk
(t) :=
_
r
nk
for t C
nkr
0 otherwise.
Then we have by construction [x
nk
(t)[ [x
n
(t)[ and [x
nk
x
n
[
nk
(hence
uniform convergence on B
n
) and thus for > 0 arbitrary
_
T
_
x
nk
x
n
_
d
_
B
n
nk
_
d = (B
n
)
_
nk
_
k
0,
and therefore for n xed
lim
k
|x
nk
x
n
|
()
= 0.
In particular by Remark 6.2.23 we conclude x
n
H
().
As a next step we show: lim
n
|x x
n
| = 0: since x x
n
H
() we obtain
by Lemma 6.2.15
1 =
_
T
_
x x
n
|x x
n
|
()
_
d.
Suppose there exists a subsequence (x
n
k
) with |x x
n
k
|
()
> 0, then
1
_
T
_
x x
n
k
_
d.
On the other hand we have by construction
[xx
n
k
[

[x[
and hence (
[xx
n
k
[
)
(
[x[
). But we also have

_
T
(
[x[
)d < and (
[xx
n
k
[
) 0 almost everywhere.
Lebesgues convergence theorem then yields
_
T
_
x x
n
k
_
d 0,
a contradiction. We obtain: x M
() and together with Remark 6.2.23 the asser-

tion.
6.2.3 The
2
-condition
In this section we will see that with an appropriate growth condition on , the so-
called
2
-condition, Orlicz class, Orlicz space, and the closure of the step functions
coincide. In addition these conditions play an essential role for the description of the
dual space (see next chapter).
Denition 6.2.25. The Young function satises the
(a)
2
-condition, if there is a R, such that for all s R
(2s) (s),
(b)
2
-condition, if there is a R and a k > 0, such that for all s k
(2s) (s),
(c)
0
2
-condition, if there is a R and a k > 0, such that for all 0 s k
(2s) (s).
Remark 6.2.26. (a) If satises the
2
-condition for a k > 0 and if is denite,
satises the
2
-condition for arbitrary k > 0.
(b) if is nite and satises the
0
2
-condition for a k > 0, satises the
0
2
-condition
for arbitrary k > 0. In particular is denite.
(c) If satises the
0
2
-condition and the
2
-condition, satises the
2
-condition.
Proof. The reason is found in the fact that the function s
(2s)
(s)
is continuous on
R
>0
and hence bounded on any compact subinterval of R
>0
.
Remark 6.2.27. The
2
-condition is satised, if and only if there is a > 1 and a
, such that: (s) (s) for s k > 0, because (see Krasnosielskii [72]) let
2
n
, then we obtain for s > k
(s) (2
n
s)
n
(s) = (s).
Conversely let 2
n
, then (2s) (
n
s)
n
(s).
If (s) (s) for s 0, then the
2
-condition holds for .
Theorem 6.2.28. If is not nite, then satises the
2
-condition. If in addition
is indenite, then satises even the
2
-condition.
Proof. Let a > 0, such that is innite on (a, ) and nite on [0, a). Then by
Lemma 6.2.9 (s) a s for s 0. On the other hand there is by Lemma 6.2.9 a
Young function
2
with
2
(s) :=
_
0 for 0 [s[ s
0
([s[ s
0
) for [s[ > s
0
.
Hence for s 2s
0
(2s) 2as =
2a
(s s
0
) +
2a
(2s
0
s
0
)
=
2a
(
2
(s) +
2
(2s
0
))
2a
2
2
(s)
4a
(s).
If is indenite, then by Lemma 6.2.9 there are a, b with b[ [ a[ [. We
then have
(2s) 2a[s[ = 2
a
b
b[s[ 2
a
b
(s).
Theorem 6.2.29. If satises the
2
-condition, then
C
() = L
()
holds. This equality also holds, if (T) < and satises only
2
-condition.
Proof. Let x L
(), then there is an R

>0
with
_
T
(x)d < . Let
1
2
k
, then
_
T
(x)d =
_
T
_
2
k
2
k
x
_
d
k
_
T
_
1
2
k
x
_
d
k
_
T
(x)d < .
Let now (T) < , let satisfy the
2
-condition for a s
0
> 0, and let T
0
be
dened by
T
0
:= t T [ 2
k
[x(t)[ s
0
,
then
_
T
(x)d =
_
T
0
(x)d +
_
T\T
0
(x)d
_
T
0
(x)d +(T T
0
)(2
k
s
0
).
For the integral over T
0
we obtain as above
_
T
0
(x)d
k
_
T
0
(x)d < ,
and thus the assertion.
Remark. The above theorem immediately implies that the modular is nite on all of
L
(), provided that satises the

2
-condition (resp. for nite measure the
2
-
condition).
For not purely atomic measures also the converse of the above theorem holds. In
order to prove this we need the following (see [114]):
Lemma 6.2.30 (Zaanen). Let E be of nite positive measure and contain no atoms,
then for every number with 0 < < (E) there is a subset F of E with the property
(F) = .
Theorem6.2.31. Let (T, , ) be a not purely atomic measure space with (T) < ,
then the following implication holds
C
() = L
() satises the
2
-condition.
Proof. Let A be the set of atoms and let := (T A):
Suppose does not satisfy the
2
-condition, then there is due to Remark 6.2.27 a
monotonically increasing sequence (t
n
) with t
n

n
, where (t
n
) > 1 for all
n N and
__
1 +
1
n
_
t
n
_
> 2
n
(t
n
) for n N.
Let further (T
n
) be a sequence of disjoint subsets of T A with
(T
n
) =

2
n
(t
n
)
,
which we construct in the following way: let T
1
T A be chosen according to
Lemma 6.2.30, such that (T
1
) =

2(t
1
)
and let T
n+1
(T A)
n
i=1
T
i
with
(T
n+1
) =

2
n+1
(t
n+1
)
. This choice is possible (again employing Lemma 6.2.30),
since
_
(T A)
n
_
i=1
T
i
_
= (T A)
n
i=1
(T
i
) =
_
1
n
i=1
1
2
i
(t
i
)
_

2
n
.
We now dene a function
x(t) :=
_
t
n
for t T
n
0 for t T
n=1
T
n
.
We conclude x C
(), because
f
(x) =
_
T
(x)d =
n=1
(t
n
)(T
n
) =
n=1
1
2
n
< .
But we will now show: x is not in C
() for arbitrary > 1: we then have >

1 +
1
n
for n > n
0
, hence
_
T
n
(x)d = (t
n
)(T
n
) >
__
1 +
1
n
_
t
n
_
(T
n
) 2
n
(t
n
)

2
n
(t
n
)
= ,
and therefore
f
(x) =
_
T
(x)d =
n=1
_
T
n
(x)d = ,
a contradiction.
The connection between
2
-condition and the equality of Orlicz class and Orlicz
space is not present for every measure space, as the example of the R
n
shows:
Example. Let be nite and let T = t
1
, . . . , t
n
with (t
i
) = 1 for i = 1, . . . , n.
Then C
= L
, even if does not satisfy a

2
-condition.
0
2
-condition, then
c
.
Proof. Let x
, then there is an R
>0
with c :=f
(x) =
t
i
T
(x(t
i
)) <
, hence for all i N (x(t
i
)) c holds, i.e.
[x(t
i
)[
1
1
(c) =: s
0
.
Let now
1
2
k
. Using the
0
2
-condition for 0 s s
0
the remaining part of the
proof can be performed in analogy to that of Theorem 6.2.29.
If satises the
2
- resp. the
0
2
-condition, then, as we have seen, the Orlicz class
and Orlicz space agree. Thus the Orlicz class is a linear space, and contains all real
multiples. An immediate consequence is the
2
-condition, then
H
() = C
().
This equality also holds, if (T) < and satises only the
2
-condition.
In the same way we obtain
0
2
-condition, then
h
() = c
().
The theoremof LindenstraussTsafriri, developed in the sequel, asserts in particular
that also the converse of the previous Theorem 6.2.34 holds. As a preparation we need
the following
Lemma 6.2.35. Let be a nite Young function. Then h
is a closed subspace of
,
and the unit vectors e
i
form a basis of h
.
Proof. Let x h
and let x
n
=
n
i=1
x(t
i
)e
i
. By denition
x
holds for all

> 0, i.e.
f
_
x
_
=
i=1
_
x(t
i
)
_
< .
In particular the above series converges and we obtain for n sufciently large
i=n+1
_
x(t
i
)
_
1 f
_
x x
n
_
1 |x x
n
|
()
.
Therefore h
[e
i
] and hence

h
[e
i
]. On the other hand, apparently [e
i
] h
holds and hence [e

i
]

h
. We obtain [e
i
] =

h
.
Let x

h
, then there exists a sequence (x

n
) h
, x
n
=
i=1
x
n
(t
i
)e
i
with
x
n
x.
Let x
n
=
N(n)
i=1
x
n
(t
i
)e
i
be chosen such that | x
n
x
n
|
()

1
n
. Apparently
|x x
n
|
()
|x x
n
|
()
+
1
n
,
i.e. lim
n
x
n
= x. Let now c
n
:= |x x
n
|
()
, then
1 f
_
x x
n
c
n
_
=
i=N(n)+1
_
x(t
i
)
c
n
_
+
N(n)
i=1
_
x(t
i
) x
n
(t
i
)
c
n
_
.
We obtain
i=1
(
x(t
i
)
c
n
) < for a sequence (c
n
) tending to zero and hence x h
,
because let k N be arbitrary, then c
n

1
k
for n > N and hence
i=N
(kx(t
i
))
i=N
_
x(t
i
)
c
n
_
< .
We will now establish the uniqueness of the representation:
Suppose, s
n
=
n
k=1
x
k
e
k
and s
t
n
=
n
k=1
x
t
k
e
k
converge to x h
and suppose
there is an n
0
N with x
n
0
,= x
t
n
0
, then u
n
= s
n
s
t
n
=
n
k=1
(x
k
x
t
k
)e
k
,= 0 for
n n
0
. Then due to the monotonicity of the norm for n n
0
|u
n
|
()
|(x
n
0
x
t
n
0
)e
n
0
|
()
= [x
n
0
x
t
n
0
[
1
1
(1)
> 0.
Therefore the sequence (|u
n
|
()
) cannot tend to zero, a contradiction.
Remark. In Theorem 6.2.22 we have established the closedness of h
in a different
way.
In the theorem of LindenstraussTsafriri it will be shown, that
is separable, if
satises the
0
2
-condition.
Theorem 6.2.36.
is not separable.
Proof. Let U := x
[ x = (x
i
), x
i
0, 1 for all i N and let x, y U with
x ,= y, then apparently |x y|
= 1. Using Cantors method one can show that U

is uncountable. Thus no countable dense subset of
can exist.
In the subsequent consideration we need the following notion
Denition 6.2.37. A basis x
n
of a Banach space is called boundedly complete, if
for every number sequence (a
n
) with sup
n
|
n
i=1
a
i
x
i
| < the series
n=1
a
n
x
n
converges.
Theorem 6.2.38 (LindenstraussTsafriri). Let be a (denite) nite Young function.
a) satises the
0
2
-condition.
b)
= h
.
c) The unit vectors form a boundedly complete basis of
.
d)
is separable.
e)
contains no subspace isomorphic to
.
f)
= m
.
g)
= c
.
Proof. a) b): By Theorem 6.2.32 we have
= c
. Hence c
is a linear space,
i.e. c
= h
.
b) c): Let sup
n
|
n
i=1
a
i
e
i
|
()
= q < for a sequence (a
n
), then c
n
=
|
n
i=1
a
i
q
e
i
|
()
1 holds, hence
1 f
n
i=1
a
i
q
e
i
c
n
_
f
_
n
i=1
a
i
q
e
i
_
.
Therefore 1

i=1
(
a
i
q
) and thus (a
1
, a
2
, . . . )
= h
. By Lemma 6.2.35
the series
i=1
a
i
e
i
converges in h
.
c) d): The linear combinations of the unit vectors with rational coefcients e
i
form a countable dense subset of
.
d) e): Since
is not separable,
cannot contain a subspace isomorphic to
.
e) a): Suppose does not satisfy the
0
2
-condition. Then there is a sequence
(t
n
)
nN
, such that
(2t
n
)/(t
n
) > 2
n+1
and (t
n
) 2
(n+1)
(if necessary choose a subsequence).
Choose a sequence (k
n
) of natural numbers in the following way: let k
n
be the
smallest natural number, such that 2
(n+1)
< k
n
(t
n
), then apparently
2
(n+1)
< k
n
(t
n
) 2
n
,
and therefore
n=1
k
n
(t
n
)
n=1
2
n
=
1
1
1
2
1 = 1, while on the other hand
k
n
(2t
n
) 2
n+1
(t
n
) k
n
> 1.
Let a := (a
n
) be a bounded sequence of numbers and let T :
the linear
mapping dened by
T(a) = x := (a
1
t
1
, . . . , a
1
t
1
. .
k
1
times
, a
2
t
2
, . . . , a
2
t
2
. .
k
2
times
, . . . , a
n
t
n
, . . . , a
n
t
n
. .
k
n
times
, . . . ),
then for a ,= 0
f
_
x
sup
k
(a
k
)
_
=
n=1
k
n

_
a
n
t
n
sup
k
(a
k
)
_
n=1
k
n
(t
n
) 1
holds and hence |x|
()
sup
k
[a
k
[.
On the other hand let 0 1
for a suitable n
0
, i.e. |x|
()
2
1
sup
k
[a
k
[. In other words
2
1
|a|
|T(a)|
()
|a|
.
Therefore T(
) is isomorphic to
.
a) f): follows from b) together with Theorem 6.2.24.
f) b): This follows immediately from m
.
a) g): This follows from Theorem 6.2.34.
g) b): g) implies that c
is a linear space and hence contained in h
.
In the not purely atomic case similar assertions can be made as in the case of se-
quence spaces. The following theorem can be found in [107].
Theorem 6.2.39. Let be a nite Young function and let (T, , ) be a not purely
atomic measure space. Then the following statement holds: if does not satisfy the
2
-condition, L
() contains a subspace, which is isometrically isomorphic to
.
Proof. Let not satisfy the
2
-condition, then there is a monotonically increasing
sequence (t
n
) with t
n

n
, where (t
n
) > 1 for all n N and
__
1 +
1
n
_
t
n
_
> 2
n
(t
n
) for n N.
Let S non-atomic with 0 < (S) 1. Let further (S
n
) be a sequence of
disjoint subsets of S with
(S
n
) =
(S)
2
n
.
Then we construct a sequence (x
n
)
n
in L
() with supp x
n
S
n
, such that f
(x
n
)<
1
2
n
.
For the construction of x
n
we choose a sequence of measurable, pairwise disjoint
sets S
n,k
S
n
with the property (S
n,k
) =
(S
n
)
2
k
(t
k
)
. Then we put
x
n
:=
k=1
t
k
S
n,k
.
In this way we obtain
f
(x
n
) =
k=1
(S
n,k
)(t
k
) =
k=1
(S
n
)
2
k
=
(S)
2
n

1
2
n
.
Let a := (a
n
) be a bounded sequence of numbers and T :
() a linear
mapping dened by
x := T(a) =
n=1
a
n
x
n
,
then for a ,= 0
f
_
x
|a|
_
=
n=1
_
S
n
_
a
n
x
n
|a|
_
d
n=1
_
S
n
(x
n
)d
n=1
1
2
n
= 1
holds and hence |x|
()
|a|
. On the other hand let 0 < |a|
and n
0
N
be chosen, such that
[a
n
0
[
|a|
= 1 + > 1 then
f
_
x
|a|

_
=
n=1
_
S
n
_
a
n
x
n
|a|

_
d
_
S
n
0
_
a
n
0
x
n
0
|a|

_
d
=
_
S
n
0
((1 +)x
n
0
)d
=
k=1
(S
n
0
,k
)((1 +)t
k
)
k=k
0
(S
n
0
,k
)
_
t
k
_
1 +
1
k
__
>
k=k
0
(S
n
0
,k
)2
k
(t
k
) =
k=k
0
(S
n
0
) = ,
i.e. |x|
()
|a|
. In other words |T(a)|

()
= |a|
. Therefore T(
) is
isometrically isomorphic to
.
For innite measure we obtain a somewhat weaker assertion:
Theorem 6.2.40. Let (T, , ) be a -nite, non-atomic measure space with (T) =
, then the following statement holds: if L
() contains no subspace isomorphic to
, then satises the

2
-condition.
Proof. The
2
-condition we obtain by Theorem 6.2.39.
In order to establish the validity of the
0
2
-condition in this situation, we construct
a subspace of L
(), which is isometrically isomorphic to
:
Let T =
k=1
B
k
, where the B
k
are pairwise disjoint and of nite measure. Then
we choose a sequence of integers k
i
with k
0
= 0, such that (
k
1
k=k
0
+1
B
k
) > 1
and (
k
n+1
k=k
n
+1
B
k
) > 1. Now choose S
n

k
n+1
k=k
n
+1
B
k
with (S
n
) = 1 (using
Lemma 6.2.30). Those x L
() that are constant on all S

n
and are identical to zero
on T
n=1
S
n
, then form a subspace of L
(), which is isometrically isomorphic

to
: let T :
() be dened by c x :=
i=1
c
i
S
i
, then
1
_
T
_
x
|x|
()
+
_
d =
i=1
_
c
i
|x|
()
+
_
(S
i
) =
i=1
_
c
i
|x|
()
+
_
,
i.e. |c|
()
|x|
()
+, and similarly one obtains |c|
()
|x|
()
. Then, accord-
ing to our assumption
cannot contain a subspace isomorphic to
. Employing the
theorem of LindenstraussTsafriri this yields the
0
2
-condition for .
Corollary 6.2.41. The above theorem also holds, if (T, , ) is a -nite, not purely
atomic measure space with (T A) = , if A denotes the set of atoms.
As was established above, the
2
-condition has the effect that Orlicz space, Orlicz
class, and the closure of the step functions coincide. This fact is recorded in the
following
Theorem 6.2.42. Let (T, , ) be a -nite measure space and let satisfy the
2
-
condition. Then
M
() = L
() = C
() = H
().
This also holds, if (T) < and satises the
2
-condition.
Proof. Since satises the
2
- resp. the
2
-condition, is in particular nite and
hence M
() = H
() by Theorem 6.2.24, due to Theorem 6.2.29 we obtain

L
() = C
(), and by Theorem 6.2.33 C
() = H
().
For not purely atomic measures and sequence spaces we even obtain that the above
equality is equivalent to the corresponding
2
-conditions.
Theorem 6.2.43. Let (T, , ) be a not purely atomic measure space with (T) <
and let be nite, then the following statements are equivalent:
(a) satises the
2
-condition
(b) H
= L
(c) C
= L
(d) M
= L
.
Proof. (d) (b) follows from Theorem 6.2.24.
(b) (c) due to H
.
(c) (b) since C
is a linear space, hence C
.
(a) (c) follows from Theorem 6.2.29.
(c) (a) follows from Theorem 6.2.31.
If does not satisfy the
2
-condition, one can construct a function x C
()
such that the distance of 3x to all step functions is at least 1, as the subsequent example
shows:
Example 6.2.44. Let not satisfy the
2
-condition, then there is a monotonically
increasing sequence (t
n
) with t
n

n
, where (t
n
) 1 and
(2t
n
) > 2
n
(t
n
) for n N.
Let further (T
n
) be a sequence of disjoint subsets of T, constructed in a similar manner
as in Theorem 6.2.31, such that
(T
n
) =
1
2
n
(t
n
)

(T)
2
for n sufciently large. We now dene a function
x(t) :=
_
t
n
for t T
n
0 for t T
n=1
T
n
.
We obtain x C
(), because
f
(x) =
_
T
(x)d =
n=1
(t
n
)(T
n
) =
n=1
1
2
n
< .
Moreover: 3x L
but not in M
(): in order to see this let y be an arbitrary step

function, then y is in particular bounded, i.e. there is a number a with [y[ a and
hence for n sufciently large
f
(3x y)
_
T
n
(3x y)d (3t
n
a)(T
n
) (2t
n
)(T
n
)
> 2
n
(t
n
)
1
2
n
(t
n
)
= 1,
and therefore |3x y|
()
1.
For innite and non-atomic measures we have a theorem that corresponds to Theo-
rem 6.2.43:
Theorem 6.2.45. Let (T, , ) a -nite, non-atomic measure space with (T) =
and let be nite, then the following statements are equivalent:
(a) satises the
2
-condition
(b) H
() = L
()
(c) C
() = L
()
(d) M
() = L
().
Proof. (d) (a): Since M
() = H
() by Theorem 6.2.24 and H
() C
()
the equality C
() = L
() follows and hence by Theorem 6.2.31 the
2
-condition
for . Using the same construction for
as in Theorem 6.2.40, the Orlicz space

L
() contains a subspace isomorphic to
. If does not satisfy the

0
2
-condition,
contains a subspace isomorphic to
, in which the step functions are not dense, a

contradiction.
(a) (c) follows from Theorem 6.2.29.
(c) (b) follows from Theorem 6.2.33.
(b) (d) follows from Theorem 6.2.24.
Corollary 6.2.46. The above theorem also holds, if (T, , ) is a -nite, not purely
atomic measure space with (T A) = , if A denotes the set of atoms.
6.3 Properties of the Modular
6.3.1 Convergence in Modular
We have seen above (see Theorem 6.2.21) that norm convergence always implies
convergence in the modular.
The subsequent theorem states that the converse also applies, provided that sat-
ises the
2
-condition.
Theorem 6.3.1. Let satisfy the
2
-condition and let (x
n
)
nN
be a sequence of
elements in L
(). then
f
(x
n
)
n
0 |x
n
|
()
n
0
holds. This is also true for nite measure, if is denite and only satises the
2
-
condition.
Section 6.3 Properties of the Modular 209
Proof. Let > 0 and 2
m
< for a m N, then for a > 0
_
T
(2
m
(x
n
))d
m
_
T
(x
n
)d
n
0.
Hence there is a N N, such that for all n > N
_
T
_
x
n
2
m
_
d =
_
T
(2
m
(x
n
))d 1
holds. Therefore
|x
n
|
()
2
m
< .
We now consider the case (T) < . Since is denite, the
2
-condition holds
for arbitrary k > 0 (see Remark 6.2.26). Let now s
0
> 0 be chosen, such that
(s
0
) (T) <
1
2
. Let T
n
be dened by
T
n
:= t T [ 2
m
[x
n
(t)[ s
0
,
and m chosen as above, then
_
T
(2
m
(x
n
))d =
_
T
n
(2
m
(x
n
))d +
_
T\T
n
(2
m
(x
n
))d
_
T
n
(2
m
(x
n
))d +(T T
n
)(s
0
)
holds. For the integral over T
n
we obtain with k =
s
0
2
m
and the
2
-condition for
s k as above
_
T
n
(2
m
(x
n
))d
m
_
T
n
(x
n
)d
n
0.
Altogether we obtain for n sufciently large
_
T
(2
m
(x
n
))d 1,
and therefore the assertion.
Example 6.3.2. Let (s) = e
[s[
1. Then does not satisfy the
2
-condition. Let
further T = [0, 1], > 0, and let
x
(t) =
_
ln

t
for 0 < t
2
0 for
2
< t 1.
For c >
1
2
we then obtain
f
_
x
c
_
=
_

2
0
__

t
_1
c
1
_
dt =
_

2
0
(
1
c
t
1
2c
1)dt
=
_
1
c
1
1
1
2c
t
1
1
2c
t
_
2
0
=
_
1
c
2
1
c

1
1
1
2c
2
_
=
2
_
1
1
1
2c
1
_
=
2
1
2c
1
1
2c
.
For c = 1 we have f
(x
) =
2
and for c =
1
2
(
2
+ 1)
f
_
x
c
_
=
2
1
2
+1
1
1
2
+1
=
2
1+
2
2
1+
2
= 1,
i.e. |x
|
()
=
1
2
(
2
+ 1).
For 0 we obtain f
(x
) 0 but |x
|
()

1
2
, showing that convergence in
the modular does not imply norm convergence.
Moreover, for 0 < < 2 we nd
f
(x
1
) =
_
1
0
(t
2
1)dt =
2
1

2
< ,
but f
(2 x
1
) =
_
1
0
(e
2 ln
1
t
1)d =
_
1
0
(
1
t
1)dt = i.e. C
,= L
.
For sequence spaces we obtain a correspondence to the previous theorem:
0
2
-condition and let (x
n
)
nN
be a sequence of
elements in
. Then
f
(x
n
)
n
0 |x
n
|
()
n
0.
Proof. We show as a rst step the uniform boundedness of the elements of the se-
quence: from f
(x
n
)
n
0 we obtain a c > 0, such that
f
(x
n
) =
t
i
T
(x
n
(t
i
)) c,
i.e. [x
n
(t
i
)[
1
(c) for all i N. Let > 0 and 2
m
< for a m N. If we put
s
0
:= 2
m
1
(c), then, using the
0
2
-condition for 0 s s
0
the remaining part of
the proof can be performed in analogy to that of Theorem 6.3.1.
Example 6.3.4. Let
(s) =
_
_
0 for s = 0
e
1
s
for 0 < s
1
2
(s
1
2
)
_
2
e
_
2
+e
2
for s >
1
2
.
For 0 s
1
4
we have
(2s)
(s)
=
e
1
2s
e
1
s
= e
1
2s
e
2
2s
= e
1
2s
s0
,
i.e. does not satisfy the
0
2
-condition. The convexity of for s
1
2
follows from
t
(s) =
1
s
2
e
1
s
tt
(s) =
2
s
3
e
1
s
+
1
s
4
e
1
s
=
_
2 +
1
s
_
1
s
3
e
1
s
0.
We now consider the space l
. Let n > e
2
, then we dene x
n
(t
k
) :=
1
ln n
2
for
1 k n and equal to zero for k > n, then
f
(x
n
) =
n
k=1
_
1
ln n
2
_
=
n
k=1
1
e
ln n
2
= n
1
n
2
=
1
n
n
0,
but |x
n
|
()
=
1
2
, because
f
_
x
n
1
2
_
=
n
k=1
_
1
ln n
2
1
2
_
= n
_
2
2 ln n
_
= n
_
1
ln n
_
= n
1
e
ln n
= 1,
i.e. convergence in the modular does not imply norm convergence.
Moreover, c
,=
, because let x(t

n
) :=
1
ln n
2
for n N and n 2, then
f
(x) =
n=2
1
n
2
< but
f
(2x) =
n=2
_
2
2 ln n
_
=
n=2
1
n
= .
6.3.2 Level Sets and Balls
In the sequel we will establish a relation between level sets of the modular and balls
w.r.t. the Luxemburg norm.
Notation 6.3.5. The ball about x with radius r w.r.t. the Luxemburg norm we denote
by
K
()
(x, r) := y L
() [ |y x|
()
< r.
Lemma 6.3.6. Let 1 > > 0 then K
()
(0, 1 ) S
f
(1) K
()
(0, 1) holds.
Proof. Let x S
f
(1) then
_
T
_
x
1
_
d =
_
T
(x)d 1.
By denition this implies |x|
()
1. On the other hand let x K
()
(0, 1 ) and
x ,= 0, then by denition of the Luxemburg norm there is a sequence (c
n
) of positive
numbers such that c
n
> |x|
()
and lim
n
c
n
= |x|
()
1 , i.e. c
n
1

2
for n sufciently large. Due to the convexity of we then obtain
1
_
T
_
x
c
n
_
d
1
c
n
_
T
(x)d,
and thus
_
T
(x)d c
n
< 1 i.e. x S
f
(1).
Remark 6.3.7. In Theorem 7.3.5 we will show that S
f
(1) is closed.
Remark 6.3.8. From the above lemma and Theorem 7.3.5 we conclude
S
f
(1) = K
()
(0, 1).
By the characterization of the continuity of real-valued convex functions (see The-
orem 3.5.4) we know that boundedness on a ball guarantees continuity. But according
to the above lemma we have K
()
(0, 1 ) S
f
(1) and hence f
(x) 1 for
x K
()
(0, 1 ).
In particular we obtain the following
Theorem 6.3.9. Let be nite, then f
: H
() R is a continuous function.
Proof. By denition f
is nite on H
.
6.3.3 Boundedness of the Modular
A convex function on a normed space is called bounded, if it is bounded on bounded
sets.
2
-condition (resp. for T of nite measure the
2
-condition), then f
: L
() R is a bounded function.
Proof. Let at rst satisfy the
2
-condition, let M > 0 and x K
()
(0, M). Let
further 2
n
M, then we obtain
f
(x) f
_
2
n
M
x
_

n
f
_
x
M
_

n
.
Let now T be of nite measure and let satisfy the
2
-condition, then put
T
0
:=
_
t T
[x(t)[
M
s
0
_
,
and we obtain
f
(x) =
_
T
0
(x)d +
_
T\T
0
(x)d
_
T
0
_
2
n
M
x
_
d +(T)(Ms
0
)

n
_
T
0
_
x
M
_
d +(T)(Ms
0
)
n
+(T)(Ms
0
).
For sequence spaces we obtain
Theorem 6.3.11. Let be nite and let satisfy the
0
2
-condition, then f
R
is a bounded function.
Proof. Let satisfy the
0
2
-condition on every interval [0, a] (see Remark 6.2.26),
let M > 0 and x K
()
(0, M). Then we have:

i=1
(
x
i
M
) 1 and hence
[x
i
[
M

1
(1). Let further 2
n
M, then we obtain for a := 2
n
1
(1)
f
(x) f
_
2
n
M
x
_

n
f
_
x
M
_

n
.
Chapter 7
Orlicz Norm and Duality
7.1 The Orlicz Norm
The functional N
dened below will turn out to be a norm, whose domain of nite

values will emerge to be L
(). This norm will be called the Orlicz norm.

Denition 7.1.1. Let be a Young function and its conjugate. Let E be the space
of equivalence classes of -measurable functions, then we dene the following func-
tional:
N
: E R,
where
N
(u) := sup
_
_
T
v ud
v S
f
(1)
_
.
Remark. The functional dened above is monotone in the following sense: from
[u
1
[ [u
2
[ it follows that N
(u
1
) N
(u
2
), because
_
T
vu
1
d
_
T
sign(u
1
)v[u
1
[d
_
T
[v[ [u
1
[d
_
T
[v[ [u
2
[d
=
_
T
sign(u
2
)[v[u
2
d
(u
2
)
holds, and hence apparently N
(u
1
) N
(u
2
).
Theorem 7.1.2. Let be the conjugate function of , then
(a) N
is nite on L
()
(b) L
() L
()
(c) N
2| |
()
hold.
Proof. Let x L
() and y S
f
(1). Let further > |x|
()
, then due to Youngs
inequality
_
T
x yd

_
T
[y[d
__
T
_
x
_
d+
_
T
(y)d
_
(1 +1) = 2.
(7.1)
Section 7.2 Hlders Inequality 215
By Lemma 6.3.6 we have K
()
(0, 1 ) S
f
(1). Thus the functional y
_
T
x
yd dened by x is bounded on K
()
(0, 1 ) and hence a continuous functional on
L
(). Moreover we obtain N
(x) 2 and hence (c).

The functional N
is apparently a norm on L
(), which we will call the Orlicz

norm, and it will turn out to be equivalent to the Luxemburg norm(see Theorem7.5.4).
Denition 7.1.3. On L
() we dene the Orlicz norm by

|x|
:= sup
_
_
T
x yd
y S
f
(1)
_
.
Remark. Since
S
f
(1) = K
()
(0, 1),
(see Remark 6.3.8) it follows that
|x|
= sup
_
_
T
x yd
|y|
()
1
_
.
Thus the Orlicz norm corresponds to the canonical norm of the dual space, i.e. (L
(),
| |
) is isometrically imbedded into the dual space of L
().
7.2 Hlders Inequality
Theorem 7.2.1. The following inequalities hold:
(a) |x|
2|x|
()
for all x L
()
(b) Hlders inequality [
_
T
x yd[ |x|
|y|
()
.
Proof. (a) follows from Theorem 7.1.2.
Let further be > 0, then by Lemma 6.3.6
y
|y|
()
+
S
f
(1), i.e.
_
T
x
y
|y|
()
+
d
|x|
,
whence Hlders inequality follows.
Since the conjugate of is again , we can interchange the roles of and in
the above consideration and we obtain L
() as a subspace of L
()
.
We will investigate under what conditions these two spaces are equal in the sequel.
It will emerge that in this context the
2
-condition plays a central role.
216 Chapter 7 Orlicz Norm and Duality
7.3 Lower Semi-continuity and Duality of the Modular
Beyond the description of the dual spaces we are also interested in this chapter in the
duality relation between modulars: the fact that Young functions are mutual conju-
gates carries completely over to the corresponding modulars.
Denition 7.3.1. (f
: (L
())
R
(f
(x
) = sup
yL
()
__
T
x
yd f
(y)
_
.
Remark. (f
is thereby also dened on L
(), because by Theorem 7.1.2 we have

L
() (L
())
.
We will use the following lemma for real-valued functions f on Cartesian products
of arbitrary sets as a tool to interchange the order of least upper bounds for the values
of f:
Lemma 7.3.2. Let A, B be sets and f : AB R, then
sup
AB
f(a, b) = sup
A
sup
B
f(a, b) = sup
B
sup
A
f(a, b)
holds.
Proof. Let (a
n
, b
n
)
n
be a sequence in AB with
f(a
n
, b
n
)
n
sup
AB
f(a, b).
Let rst of all sup
AB
f(a, b) < , then for n > N and given > 0
f(a
n
, b
n
) sup
AB
f(a, b)
holds, moreover
f(a
n
, b
n
) sup
B
f(a
n
, b) sup
A
sup
B
f(a, b)
f(a
n
, b
n
) sup
A
f(a, b
n
) sup
B
sup
A
f(a, b).
Since sup
B
f(a, b) sup
AB
f(a, b) for all a A and
sup
A
f(a, b) sup
AB
f(a, b) for all b B,
we obtain
sup
AB
f(a, b)
_
sup
A
sup
B
f(a, b)
sup
B
sup
A
f(a, b)
_
sup
AB
f(a, b).
The case sup
AB
f(a, b) = can be treated in a corresponding manner.
Section 7.3 Lower Semi-continuity and Duality of the Modular 217
Lemma 7.3.3. Let be a non-nite Young function. Then there is a sequence (
n
)
nN
of nite Young functions converging pointwise monotonically increasing to . The
sequence (
n
)
nN
of the corresponding conjugates then converges pointwise mono-
tonically decreasing to , the conjugate of .
Proof. Let be nite for [s[ < a and innite for [s[ > a, and let (s
n
) be a monoton-
ically increasing sequence in (0, a) with s
n
a. Then we dene for s 0
n
(s) :=
_
_
(s) for 0 s s
n
(s
n
) +
t
+
(s
n
)(s s
n
) for s
n
s < a
(s
n
) +
t
+
(s
n
)(a s
n
) +n(s a)
2
for s a.
For s < 0 we put:
n
(s) :=
n
(s).
We now discuss the monotonicity:
(a) 0 s s
n
: here we have:
n
(s) =
n+1
(s).
(b) s
n
s s
n+1
: according to the subgradient inequality we have
n
(s) = (s
n
) +
t
+
(s
n
)(s s
n
) (s) =
n+1
(s). (7.2)
(c) s
n+1
s a: and again by the subgradient inequality
n
(s
n+1
) = (s
n
) +
t
+
(s
n
)(s
n+1
s
n
) (s
n+1
) =
n+1
(s
n+1
),
apparently
n
(s) =
n
(s
n+1
) +
t
+
(s
n
)(s s
n+1
),
and hence, using (7.2) and the monotonicity of the right-sided derivative
n
(s)
n+1
(s
n+1
) +
t
+
(s
n+1
)(s s
n+1
) =
n+1
(s),
(d) s > a: since, again due to (7.2) and the monotonicity of the right-sided deriva-
tive, we have
n
(a) =
n
(s
n
) +
t
+
(s
n
)((a s
n+1
) + (s
n+1
s
n
))
(s
n+1
) +
t
+
(s
n
)(a s
n+1
)
n+1
(a),
monotonicity also holds in this case.
As to pointwise convergence we remark: for 0 s < a we have for n sufciently
large: s < s
n
and hence
n
(s) = (s).
For s = a we nd: 0
t
+
(s
n
)(a s
n
) (a) (s
n
) and hence
(s
n
) (s
n
) +
t
+
(s
n
)(a s
n
) =
n
(a) (a).
Due to the lower semi-continuity of we have: (s
n
) (a) and hence (
n
)
converges pointwise to .
This also holds, if (a) is nite and
t
+
(s
n
) , but also, if (a) = , because
is due to its lower semi-continuity continuous in the extended sense.
Moreover, is by Lemma 6.2.9 nite. Thus the pointwise convergence of the
sequence (
n
) to follows from Theorem 6.1.27.
Theorem 7.3.4. Let be a Young function and its conjugate, then for arbitrary
x L
()
f
(x) = sup
yL
()
__
T
xyd f
(y)
_
= (f
(x)
holds.
Proof. At rst let be nite. By Youngs inequality we have for y L
a.e.:
(x(t)) x(t) y(t) (y(t)) and hence: f
(x)
_
T
xyd f
(y). There-
fore
f
(x) sup
yL
()
__
T
xyd f
(y)
_
.
If 0 u E is bounded with nite support, then the same is true for (
t
+
(u)), since
according to Lemma 6.1.10 (
t
+
(0)) = 0 and (
t
+
()) is nite and apparently
monotonically increasing. Hence we obtain f
(y) < for y =

t
+
(u), in particular
also y L
(). Youngs equality then implies (u) = u

t
+
(u) (
t
+
(u)) and
therefore
f
(u) =
_
T
uyd f
(y).
Let now x L
() be arbitrary, T =
j=1
B
j
, (B
j
) < for j N, B
n
:=
n
j=1
B
j
and
x
n
(t) :=
_
x(t) for [x(t)[ n, t B
n
0 otherwise
then all functions x
n
are bounded, have a support of nite measure and [x
n
[ [x[ a.e.
Employing the theorem on monotone convergence of integration theory it follows that
f
(x
n
) f
(x) and for y

n
=
t
+
(x
n
) we obtain
f
(x
n
) =
_
T
x
n
y
n
d f
(y
n
).
We now distinguish two cases:
(a) f
(x) = : let c > 0 be arbitrary, then there is a N N, such that for n > N
f
(x
n
) =
_
T
x
n
y
n
d f
(y
n
) > c.
Section 7.3 Lower Semi-continuity and Duality of the Modular 219
For n xed then x
k
y
n

k
xy
n
pointwise a.e., and by construction [x
k
[ [y
n
[
[x[ [y
n
[, and
_
T
[x[ [y
n
[d < due to Hlders inequality. Lebesgues convergence
theorem then implies that
_
T
[x
k
[ [y
n
[d
_
T
[x[ [y
n
[d,
and hence
_
T
xy
n
d f
(y
n
) > c
for n sufciently large, hence
sup
yL
()
__
T
xyd f
(y)
_
= .
(b) f
(x) < : let 0 < < f
(x), then for n sufciently large

f
(x
n
) =
_
T
x
n
y
n
d f
(y
n
) > f
(x) ,
and hence in a similar way as above
sup
yL
()
__
T
xyd f
(y)
_
f
(x).
Let now not be nite, then there is a sequence (
n
) of nite Young functions, con-
verging pointwise monotonically increasing to (see Lemma 7.3.3). By Lemma 6.2.7
we have in particular L
n
. The sequence (
n
) of the conjugate Young func-
tions converges pointwise monotonically decreasing to (see Theorem 6.1.27 and
Lemma 7.3.3).
Let now y L
, then we obtain by Hlders inequality
_
T
xyd
|x|
()
|y|
< .
According to our above consideration we conclude
f
n
(x) = sup
yL
n
__
T
xyd f
n
(y)
_
= sup
yL
__
T
xyd f
n
(y)
_
,
where the last equation can be justied as follows: by Lemma 6.2.7 we have L
, but for y L
n
we have f
n
(y) = .
The theorem on monotone convergence of integration theory yields for x L
(x) = sup
nN
f
n
(x),
and for y L
(y) = inf
nN
f
n
(y) = sup
nN
f
n
(y).
Altogether we conclude using the Supsup-Lemma 7.3.2
f
(x) = sup
n
f
n
(x) = sup
n
sup
yL
__
T
[xy[d f
n
(y)
_
= sup
yL
__
T
[xy[d inf
n
f
n
(y)
_
= sup
yL
__
T
[xy[d f
(y)
_
.
An important consequence of the above theorem is the following:
Theorem 7.3.5. f
: L
() R is lower semi-continuous and thus S

f
(1) is
closed.
Proof. Being the supremum of continuous functionals
_
T
()yd f
(y) on L
()
with y C
() then according to Remark 3.5.2 f
is lower semi-continuous on
L
().
7.4 Jensens Integral Inequality and the Convergence in
Measure
As an application of Hlders inequality it will turn out that norm convergence always
implies convergence in measure. This fact will become important in the next chapter.
Denition 7.4.1. We say, a sequence of measurable functions (x
n
)
nN
converges
in measure to a measurable function x
0
, if there is a sequence (Q
n
)
nN
with
(Q
n
)
n
0, such that
lim
n
sup
tT\Q
n
[x
n
(t) x
0
(t)[ = 0.
For the proof of the implication we have announced above, we can make use of the
following
Theorem 7.4.2 (Jensens Integral Inequality). Let A be a measurable set of nite
measure and a Young function and let v C
(), then
_
1
(A)
_
A
vd
_
1
(A)
_
A
(v)d
holds.
Section 7.4 Jensens Integral Inequality and the Convergence in Measure 221
Proof. Let v C
(), then, due to Hlders inequality, [v[ is integrable over A,

because
_
A
vd =
_
T
v
A
d |v|
()
|
A
|
.
Let now
v
n
(t) :=
_
v(t) for [v(t)[ n and t A
0 otherwise.
Then v
n
C
(), since [v
n
(t)[ [v(t)[ i.e. f
(v
n
) f
(v). We will now approx-

imate v
n
by a sequence of step functions: let
nk
:=
n
k
for k N and let for r Z
with 0 r k
C
nkr
:= t A[ r
nk
v
n
(t) < (r + 1)
nk
and for k r < 0

C
nkr
:= t B
n
[ (r 1)
nk
< v
n
(t) r
nk
.
v
nk
(t) :=
_
a
nkr
:= r
nk
for t C
nkr
0 otherwise.
Then [v
nk
(t)[ [v
n
(t)[ and [v
nk
(t) v
n
(t)[
nk
for all t A and hence
1
(A)
_
A
(v)d
1
(A)
_
A
(v
n
)d
1
(A)
_
A
(v
nk
)d
=
1
(A)
r
(a
nkr
)(C
nkr
)
_
1
(A)
r
a
nkr
(C
nkr
)
_
=
_
1
(A)
_
A
v
nk
d
_
,
where in particular the latter sequence is bounded. By construction we have
1
(A)
_
A
v
nk
d
_
A
v
n
d

nk
,
and the lower semi-continuity of yields
1
(A)
_
A
v
n
d Dom. Let now > 0,
then, due to the continuity of on Dom, for k N sufciently large
_
1
(A)
_
A
v
nk
d
_

_
1
(A)
_
A
v
n
d
_
.
Hence
1
(A)
_
A
(v)d
_
1
(A)
_
A
v
n
d
_
.
We conclude by using Lebesgues convergence theorem
_
A
v
n
d
_
A
vd.
For the Luxemburg norm of characteristic functions we obtain:
Lemma 7.4.3. Let be an arbitrary Young function and Q a measurable set with
(Q) < , then
(a) |
Q
|
()
=
1
1
(
1
(Q)
)
holds, provided that
1
(Q)
is an element of the range of ,
(b) |
Q
|
()
=
1
t
otherwise, where (t) = for t > t
and (t) < on

[0, t
).
Proof. (a) For c =
1
1
(
1
(Q)
)
we obtain
1 = (Q)
_
1
c
_
=
_
T
Q
c
_
d,
hence |
Q
|
()
=
1
1
(
1
(Q)
)
.
(b) In this case we have: infc > 0 [ (Q)(
1
c
) 1 =
1
t
, since for t [0, t
)
we have (t) <
1
(Q)
, hence for > 0: (
t
1+
) <
1
(Q)
. Therefore |
Q
|
()

1+
t
,
on the other hand: (
t
1
) = and thus |
Q
|
()

1
t
.
In order to obtain the Orlicz norm of characteristic functions we can employ
Jensens inequality.
Lemma 7.4.4. Let Q be a measurable set with (Q) < and let be nite, then
|
Q
|
= (Q)
1
(
1
(Q)
) holds.
Proof. By denition: |
Q
|
= sup
_
T

Q
vd[ f
(v) 1. Due to the previous

lemma we have: |
Q
|
()
=
1
1
(
1
(Q)
)
. If we put v :=
Q

1
(
1
(Q)
)(1 ), then
we obtain
|
Q
|
(Q)
1
_
1
(Q)
_
(1 )
for every 0 < < 1.
On the other hand we obtain using Jensens integral inequality for f
(v) 1
_
1
(Q)
_
Q
vd
_
1
(Q)
_
Q
(v)d
1
(Q)
f
(v)
1
(Q)
,
therefore
_
T

Q
vd (Q)
1
(
1
(Q)
) for all v S
f
(1), hence
|
Q
|
(Q)
1
_
1
(Q)
_
.
We are now in the position to present the connection between norm convergence
and convergence in measure.
Section 7.4 Jensens Integral Inequality and the Convergence in Measure 223
Theorem 7.4.5. Let (T, , ) be a -nite measure space. Then the following state-
ment holds: x
n
x
0
in L
() implies convergence in measure of x

n
to x
0
.
Proof. Suppose this is not the case, and let (Q
n
) be a sequence of measurable sets
with (Q
n
)
n
0. Then there is a > 0, such that for all N N there is n > N
with the property
sup
tT\Q
n
[x
n
(t) x
0
(t)[ ,
i.e. there is a subsequence P
n
k
T Q
n
k
and a > 0 with (P
n
k
) and
[x
n
k
(t) x
0
(t)[ for all t P
n
k
.
At rst we show: there are subsets R
n
k
P
n
k
of nite measure, such that 0 <
(R
n
k
)
(a) (T) < : put R
n
k
:= P
n
k
(b) (T) = : since the measure space is -nite, there is a representation T =
i=1
T
i
with pairwise disjoint T
i
of nite measure. Then in particular P
n
k
=
i=1
T
i
P
n
k
. If (P
n
k
) = , then choose k(n) such that R
n
k
=
k(n)
i=1
T
i

P
n
k
and < (R
n
k
) < . If (P
n
k
) < , then put R
n
k
:= P
n
k
.
We now distinguish two situations (at least one of the two Young functions, or
, is nite):
(a) is nite, then [x
n
k
x
0
[ [x
n
k
x
0
[
R
n
k
and due to the monotonicity of
the Luxemburg norm
|x
n
k
x
0
|
()

1
1
(
1
(R
n
k
)
)

1
1
(
1
)
> 0,
a contradiction.
(b) nite: using Hlders inequality we obtain
_
T
[x
n
k
x
0
[
R
n
k
d |x
n
k
x
0
|
()
|
R
n
k
|
2|x
n
k
x
0
|
()
|
R
n
k
|
()
.
On the other hand
_
T
[x
n
k
x
0
[
R
n
k
d (R
n
k
).
Hence we obtain
2|x
n
k
x
0
|
()
(R
n
k
)
1
_
1
(R
n
k
)
_

1
_
1
_
> 0,
since the function s s
1
(
1
s
) is according to Lemma 6.1.26 monotonically
increasing on (0, ), again a contradiction.
Remark. Direct use of the Orlicz norm of the characteristic function in part (b) of
the above proof would have yielded the inequality
|x
n
k
x
0
|
()
|
R
n
k
|
(R
n
k
),
therefore
|x
n
k
x
0
|
()

1
1
(
1
(R
n
k
)
)

1
1
(
1
)
> 0,
since it is easily seen that the function s
1
1
(
1
s
)
is monotonically increasing on
(0, ).
Again by use of Hlders inequality we obtain a corresponding theorem for se-
quence spaces:
Theorem 7.4.6. x
n
x
0
in
() implies convergence of x
n
to x
0
in the supremum
norm.
Proof. Suppose there is a > 0, a subsequence (x
m
) such that for every m there is a
point t
i
m
with the property: [x
m
(t
i
m
) x
0
(t
i
m
)[ > , then we obtain
i=1
[x
m
(t
i
) x
0
(t
i
)[
t
i
m
(t
i
) 2|x
m
x
0
|
()
|
t
i
m
|
()
,
where |
t
i
m
|
()
is constant according to Lemma 7.4.3. On the other hand
i=1
[x
m
(t
i
) x
0
(t
i
)[
t
i
m
(t
i
) = [x
m
(t
i
m
) x
0
(t
i
m
)[ > ,
a contradiction.
7.5 Equivalence of the Norms
In the sequel we will always assume that T is -nite.
Lemma 7.5.1. Let be nite and let
t
denote the left-sided derivative of . Let u

be measurable with N
(u) 1. Then v
0
=
t
([u[) belongs to C
() and we have
f
(v
0
) 1.
Proof. First of all we show: let v C
() then
_
T
uvd
_
N
(u) if f
(v) 1
N
(u) f
(v) for f
(v) > 1.
The rst inequality follows by denition.
Section 7.5 Equivalence of the Norms 225
Let now f
(v) > 1, then from (0) = 0 and convex we conclude
_
v
f
(v)
_
(v)
f
(v)
,
and therefore due to v C
()
f
_
v
f
(v)
_
=
_
T
_
v
f
(v)
_
d
1
f
(v)
_
T
(v)d = 1.
By denition of N
we then obtain
_
T
u
v
f
(v)
d
(u),
and hence

_
T
uvd
(u) f
(v) if f
(v) > 1. (7.3)

Let now N
(u) 1 and u ,= 0. Let T =
j=1
B
j
, (B
j
) < for j N and
B
n
:=
n
j=1
B
j
. We dene
u
n
(t) :=
_
u(t) for [u(t)[ n and t B
n
0 otherwise.
Youngs equality yields (
t
) nite on all of R and (

t
(0)) = 0 (see Lemma

6.1.10), hence
f
(
t
([u
n
[)) =
_
T
(
t
([u
n
[))d =
_
B
n
(
t
([u
n
[))d
_
B
n
(
t
(n))d
= (B
n
)(
t
(n)),
therefore
t
([u
n
[) C
(). Since u ,= 0 there is N N, such that u

n
,= 0 for
n N. We now consider two cases:
(a) denite, then (u
n
(t)) > 0 on a set of positive measure.
(b) indenite, then s
0
:= maxs [ (s) = 0 > 0. If (u
n
(t)) = 0 a.e. then
[u
n
(t)[ s
0
a.e. and hence
t
([u
n
(t)[) = 0 a.e., hence (
t
([u
n
(t)[)) = 0
a.e. So, if (
t
([u
n
(t)[)) > 0 on a set T
0
of positive measure, then [u
n
(t)[ >
s
0
on T
0
, and hence (u
n
(t)) > 0 on T
0
.
Suppose there is n
0
N with f
(
t
([u
n
0
[)) =
_
T
(
t
([u
n
0
[))d > 1.
In particular (u
n
0
(t)) > 0 on a set of positive measure. Youngs equality then
yields
(
t
([u
n
0
(t)[)) < (u
n
0
(t)) +(
t
([u
n
0
(t)[)
= [u
n
0
(t)[
t
([u
n
0
(t)[)
on a set of positive measure. Using Inequality (7.3) we obtain by integration of the
above inequality
f
(
t
([u
n
0
[) =
_
T
(
t
([u
n
0
[))d <
_
T
[u
n
0
(t)[
t
([u
n
0
(t)[)d(t)
N
(u
n
0
) f
(
t
([u
n
0
[)),
and thus N
(u
n
0
) > 1. But our assumption was N
(u) 1 and by construction

[u
n
0
[ [u[, contradicting the monotonicity of N
. Hence f
(
t
([u
n
[)) 1 for
all n N with n N. The lemma of Fatou then implies f
(v
0
) 1 and hence
v
0
C
().
Lemma 7.5.2. Let u be measurable and let N
(u) 1, then f
(u) N
(u) holds.
Proof. First of all let be nite and let v
0
=
t
([u[) sign(u). By Lemma 7.5.1 we

have f
(v
0
) 1. Youngs equality yields
u(t) v
0
(t) = (u(t)) +(v
0
(t)),
and hence
f
(u) =
_
T
(u)d
_
T
(u)d +
_
T
(v
0
)d
=
_
T
u v
0
d N
(u).
Let now not be nite. Then, as we will see below, we can construct a sequence of
nite Young functions (
n
), which is monotonically non-decreasing pointwise to .
Then by Lemma 7.3.3
n
, hence S
f
n
(1) S
f
(1) and hence N
n
(u)
N
(u) for every measurable function u. Let now N
(u) 1, then also N
n
(u) 1
and therefore, according to the rst part of the proof of this lemma
_
T
n
(u)d = f
n
(u) N
n
(u) N
(u).
By the theorem on monotone convergence,
n
(u) (u) implies
_
T
(u)d = f
(u) N
(u).
As to the construction of the sequence (
n
): let (
n
) be a monotone sequence of
positive numbers tending to zero and
n
(s) :=
n
s
2
+(s) for arbitrary s R. Then
(
n
) converges monotonically decreasing to . The sequence (
n
) of conjugate
functions is nite, since the strongly convex functions
n
(s) ts have compact level
sets for all t R. By construction the sequence (
n
) is monotonically increasing.
The stability theorem on monotone convergence (see Theorem 5.2.3) then implies (by
convergence of the values) the pointwise convergence (monotonically increasing) of
the sequence (
n
) to .
Section 7.6 Duality Theorems 227
Theorem 7.5.3. Let u E and N
(u) < , then u L
() and
N
(u) |u|
()
.
Proof. Lemma 7.5.2 implies
f
_
u
N
(u)
_
N
_
u
N
(u)
_
= 1,
and therefore N
(u) |u|
()
.
Thus Orlicz and Luxemburg norms are equivalent:
Theorem 7.5.4. For all u L
()
2|u|
()
|u|
|u|
()
holds.
Proof. Theorem 7.5.3 and Theorem 7.1.2, as well as Denition 7.1.3.
Remark. Theorem 7.5.4 also provides the following insight: the set
M := u E [ N
(u) <
is equal to L
(). In order to describe the Orlicz space as a set, the niteness of the
Luxemburg norm is equivalent to the niteness of the Orlicz norm.
7.6 Duality Theorems
Theorem 7.6.1 (Representation Theorem). Let (T, , ) be a -nite measure space
and let nite, then every continuous linear functional f on M
() can be repre-
sented by a function y L
() via the formula

f, x =
_
T
y(t)x(t)d x L
.
If M
() is equipped with the Luxemburg norm, then |f| = |y|
.
Proof. Let T =
j=1
B
j
where (B
j
) < , and let
B
n
:=
n
_
j=1
B
j
.
Let
n
the -algebra induced on B
n
and let B
n
, n xed, then
B
L
(). We
will now show that (B) := f,
B
is -additive and absolutely continuous w.r.t. :
let A
i

n
for i N,
A :=
_
i=1
A
i
, A
i
A
j
= for i ,= j,
then
i=1
A
i
=
A
. At rst we show
_
_
_
_
m
i=1
A
i

A
_
_
_
_
()
m
0.
Apparently we have (A) =
i=1
(A
i
) and hence
_
_
_
_
m
i=1
A
i

A
_
_
_
_
()
=
1
1
(
1
(
i=m+1
A
i
)
)
=
1
1
(
1
i=m+1
(A
i
)
)
m
0.
The continuity of the functional f implies
(A) = f,
A
=
i=1
f,
A
i
=
i=1
(A
i
).
Thus is -additive.
As to the absolute continuity: let E
n
, then
(E) = f,
E
|f||
E
|
()
= |f|
1
1
(
1
(E)
)
therefore
(E) = f,
E
(E)0
0.
Then the theorem of RadonNykodym implies: there is a uniquely determined func-
tion y
n
(t) L
1
(B
n
) with
(B) =
_
B
n
y
n

B
d.
By the uniqueness of y
n
and B
n
B
m
for n m we obtain y
m
[
B
n = y
n
. Let now
y
(n)
(t) := y
n
(t) for t B
n
and y
(n)
(t) = 0 otherwise. Then y
(n)
is dened on T,
and (B) =
_
T
y
(n)

B
d for B
n
. Let now y be the pointwise limit of the
sequence y
(n)
, then in particular y
[B
n = y
n
for all n N. Then for an arbitrary step
function x =
k
i=1
c
i
A
i
with A
i

n
for i = 1, . . . , k
f, x =
_
T
xyd. (7.4)
As a next step we show: (7.4) holds for all x M
(). By denition x is limit of a

sequence of step functions. We will repeat here the explicit construction in the proof
of Theorem 6.2.24, because we need its monotonicity property for further reasoning.
Let now
x
n
(t) :=
_
x(t) for [x(t)[ n and t B
n
0 otherwise,
then x
n
L
(), since [x
n
(t)[ [x(t)[ i.e. |x
n
|
()
|x|
()
. We will now approx-
imate x
n
by a sequence of step functions: let
nk
:=
n
k
for k N and let for r Z
with 0 r k
C
nkr
:= t B
n
[ r
nk
x
n
(t) < (r + 1)
nk
,
and for k r < 0
C
nkr
:= t B
n
[ (r 1)
nk
< x
n
(t) r
nk
.
x
nk
(t) :=
_
r
nk
for t C
nkr
0 otherwise.
Then [x
nk
(t)[ [x
n
(t)[ and [x
nk
x
n
[
nk
, i.e. (x
nk
) converges uniformly to x
n
,
and hence
lim
k
|x
nk
x
n
|
()
= 0.
The continuity of the functional f and its representation on the space of step func-
tions (7.4) implies
f, x
n
= lim
k
f, x
nk
= lim
k
_
T
x
nk
y d.
By construction we have
lim
k
x
nk
(t)y(t) = x
n
(t)y(t) a.e.
Moreover
[x
nk
(t)y(t)[ [x
n
(t)y(t)[ n[y(t)[
B
n(t).
The function [y[ is (due to the theorem of RadonNikodym) -integrable over B
n
.
Therefore the requirements of Lebesgues convergence theorem are satised and we
obtain
f, x
n
= lim
k
_
T
x
nk
y d =
_
T
lim
k
x
nk
y d =
_
T
x
n
y d.
By construction we also have [x
n
[ [x[ and hence |x
n
|
()
|x|
()
. Furthermore,
we have by construction: [x
n
y[
n
[xy[ a.e. and we obtain using the lemma of
Fatou
_
T
xyd
_
T
[xy[d sup
n
_
T
[x
n
y[d = sup
n
f, [x
n
[ sign(y) (7.5)
sup
n
|f||x
n
|
()
|f||x|
()
. (7.6)
i.e. x(t)y(t) is integrable. The sequence ([x
n
y[) is monotonically increasing and con-
verges a.e. to [xy[. Thus Lebesgues convergence theorem is applicable and we obtain
lim
n
f, x
n
= lim
n
_
T
x
n
y d =
_
T
xy d. (7.7)
We will now show: lim
n
|x x
n
| = 0: since x x
n
H
() (see Theo-
rem 6.2.24 and Theorem 6.2.15) we have
1 =
_
T
_
x x
n
|x x
n
|
()
_
d.
Suppose there is a subsequence (x
n
k
) with |x x
n
k
|
()
> 0, then
1
_
T
_
x x
n
k
_
d.
On the other hand we have by construction
[xx
n
k
[

[x[
and therefore (
[xx
n
k
[
)
(
[x[
). But again due to Theorem 6.2.24 we have

_
T
(
[x[
)d < and by construc-

tion (
[xx
n
k
[
) 0 a.e. Lebesgues convergence theorem then yields

_
T
_
x x
n
k
_
d 0,
a contradiction.
By the continuity of f and equation (7.7) we obtain
lim
n
f, x
n
= f, x =
_
T
xy d,
i.e. the representation of the functional f by a measurable function y.
It remains to be shown: y L
() and |y|
= |f|. The rst part follows, if

we can substantiate N
(y) < , this is certainly the case if: [

_
T
zyd[ |f| for
all z L
() with f
(z) 1: for given z S

f
(1) we construct (in analogy to
the above sequence (x
n
)) a sequence of bounded functions (z
n
), which in turn are
approximated by step functions. In particular we then have z
n
M
(). Then as
above the lemma of Fatou implies
_
T
zyd
_
T
[zy[d sup
n
_
T
[z
n
y[d = sup
n
f, [z
n
[ sign(y) (7.8)
sup
n
|f||z
n
|
()
|f||z|
()
. (7.9)
By denition we have
|f| = supf, x [ |x|
()
1, x M
(),
and by virtue of the representation of the functional
|f| = sup
__
T
xyd[ |x|
()
1, x M
()
_
. (7.10)
But Inequality (7.8) even implies together with (7.10)
|f| = sup
__
T
zyd[ |z|
()
1, z L
()
_
= |y|
where the last equality holds by the denition of the Orlicz norm.
Remark. Since in the above theorem z is in general not in H
(), the norm con-

vergence of the sequence (z
n
) does not follow, because
_
T
(
z
)d < does not

necessarily hold.
Under additional conditions the roles of Luxemburg norm and Orlicz norm can be
interchanged:
Theorem 7.6.2. Let and be nite and let M
() = L
(), then
(a) (M
(), | |
= (L
(), | |
()
)
(b) (M
(), | |
= (L
(), | |
)
hold.
Proof. Let X := (M
(), | |
()
), then by the previous theorem X
= (L
(),
| |
). Let now U := (M
(), | |
) and let f U
. Due to the equivalence

of Luxemburg and Orlicz norms there is according to the representation theorem
a y L
() = M
() with f, u =
_
T
u yd for all u U. Thus we obtain
|f| = supf, u [ u M
, |u|
1 = sup
__
T
uy d
u M
, |u|
1
_
.
(7.11)
Let now z S
f
(1), then we construct in an analogous way as in the previous theo-
rem a sequence
z
n
(t) :=
_
z(t) for [z(t)[ n and t B
n
0 otherwise,
then as in the previous theorem z
n
M
(). Using the monotonicity of the Orlicz

norm we obtain, again similar to the previous theorem
_
T
zyd
_
T
[zy[d sup
n
_
T
[z
n
y[d = sup
n
f, [z
n
[ sign(y)
sup
n
|f||z
n
|
|f||z|
,
and hence together with (7.11)
|f| = sup
_
_
T
zyd
z L
, |z|
1
_
.
Dening f, z :=
_
T
zyd for z L
we can, due to Hlders inequality, extend

f as a continuous functional to L
. But by Theorem 7.6.1 we have L
= X
, i.e.
f X
. According to [39] p. 181, Theorem 41.1 the canonical mapping I : X

X
with y
_
T
()yd is norm preserving, i.e. in this case: |f| = |y|
()
. Hence
equation (a) holds.
Since
(L
(), | |
()
)
= (M
(), | |
()
)
= (L
(), | |
),
and (a) also equation (b) follows.
As a consequence of Theorem 7.6.1 we obtain
2
-condition. Then every continuous linear func-
tional f on (L
(), | |
()
) is represented by a function y L
() via the formula

f, x =
_
T
y(t)x(t)d x L
,
where |f| = |y|
. This also holds, if (T) < and satises only the
2
-
condition.
Proof. With Theorem 6.2.42 we have M
= L
. The remaining part follows from

the representation Theorem 7.6.1.
Remark. In particular (L
p
())
= L
q
() for p 1 and
1
p
+
1
q
= 1.
Section 7.7 Reexivity 233
For sequence spaces we obtain in analogy to Theorem 7.6.3.
0
2
-condition. Every continuous linear functional f
on (
, | |
()
) is represented by a sequence y
via the formula

f, x =
t
i
T
y(t
i
)x(t
i
) x
,
where |f| = |y|
.
Proof. This follows from the theorem of LindenstraussTsafriri 6.2.38 together with
Theorem 7.6.1.
Remark. In particular (
p
)
=
q
for p 1 and
1
p
+
1
q
= 1.
If is not nite, then we obtain the following assertion about the dual space of
L
()):
Theorem 7.6.5. Let not be nite and let for innite measure in addition be
indenite, then
(L
())
= L
().
Moreover, L
() and L
() are equal as sets and the Orlicz norm||
is equivalent
to | |
.
Proof. By Theorem 6.2.28 satises the
2
- (resp. for innite measure the
2
-)
condition. Then by Theorem 6.2.42 M
() = L
(). By Theorem 6.2.10 L

1
()
and L
() are equal as sets and the corresponding norms are equivalent.

By the above remark (L
1
())
= L
(). Again by Theorem 6.2.10 L
() and
L
() are equal as sets and the Luxemburg norm | |

()
. By
Theorem 7.5.4 the Luxemburg norm| |
()
is equivalent to the Orlicz norm| |
.
Remark. The above theorem does not hold in general for (T) = and denite,
since for
(s) =
_
s
2
for [s[ 1
otherwise,
we obtain:
=
2
and
=
2
.
7.7 Reexivity
Our insights into dual spaces, provided by the previous section, put us into the position
to characterize the reexivity of Orlicz spaces.
Theorem 7.7.1. Let and be mutually conjugate Young functions, then:
(a) Let (T) < and let the measure space be not purely atomic, then L
() is
reexive, if and only if and satisfy the
2
-condition.
(b) Let (T) = and the measure space not purely atomic with (T A) =
(A set of atoms in T), then L
() is reexive, if and only if and satisfy the
2
-condition.
Proof. At rst we show: nite, for suppose not nite, then L
() contains a
subspace isomorphic to L
(
0
): let T
0
be of nite measure and (
0
,
0
, T
0
)
the measure space induced on T
0
, then L
(
0
) is a closed subspace of L
(). By
Theorem 6.2.10 we have L
(
0
) is isomorphic to L
(
0
). According to (see [41])
every closed subspace of a reexive Banach space is reexive, hence L
() cannot
be reexive if is not nite.
Let now (T) < . Suppose the
2
-condition is not satised, then by Theo-
rem 6.2.39 L
() contains a subspace isomorphic to
, contradicting its reexivity.

Therefore by Theorem 6.2.42 M
() = L
() and by Theorem 7.6.1 (L
())
=
L
(). Since (L
())
= L
() then also L
() is reexive, i.e. also satises

the
2
-condition.
For innite measure the assertion follows in analogy to the above line of reasoning
using Theorem 6.2.40 and Corollary 6.2.41.
For sequence spaces we obtain a somewhat weaker version of the above theorem:
Theorem 7.7.2. Let and be nite, then
is reexive, if and only if and

satisfy the
0
2
-condition.
We nally obtain from reexivity of the closure of the step functions already equal-
ity to the Orlicz space.
Theorem 7.7.3. Let be nite. If M
() is reexive, then M
() = L
() holds.
Proof. By denition M
() L
(). By Theorem 7.6.1 we have: (M
())
=
L
(). Reexivity then implies M
() = (M
())
= (L
())
. But from
Hlders inequality it follows that L
() (L
())
and hence
L
() (L
())
= M
() L
().
7.8 Separability and Bases of Orlicz Spaces
7.8.1 Separability
For sequence spaces the theorem of LindenstraussTsafriri (see Theorem 6.2.38) con-
tains the following assertion:
is separable, if and only if satises the

0
2
-condition,
provided that is nite.
Section 7.8 Separability and Bases of Orlicz Spaces 235
Furthermore, Theorem 6.2.36 asserts:
is not separable.
A basis for the investigation of non-atomic measures in this section is the
Theorem 7.8.1 (Lusin). Let T be a compact Hausdorff space, the corresponding
Baire measure and let x be a measurable function. Then for every > 0 there is a
function y continuous on T such that
(t T [ x(t) y(t) ,= 0) < .
Moreover, if |x|
< then y can be chosen such that |y|
|x|
.
Then (compare [72]):
Theorem 7.8.2. Let T be a compact Hausdorff space, (T, , ) the corresponding
Baire measure space, and let and be nite. Then the functions continuous on T
are dense in M
().
Proof. Let u M
() be bounded, such that [u[ a, then by the theorem of Lusin

there is a sequence (u
n
) of continuous functions with [u
n
[ a, such that u u
n
different from zero only on a set T
n
with (T
n
)
1
n
. Then by Hlders inequality
and Lemma 7.4.4 and Lemma 6.1.26
|u u
n
|
= sup
|v|
()
1
_
T
(u u
n
)vd
2a sup
|v|
()
1
_
T
n
[v[d = 2a|
T
n
|
= 2a(T
n
)
1
_
1
(T
n
)
_
2a
n

1
(n).
By Lemma 6.1.26 it follows that: lim
n
|u u
n
|
= 0. Let now w M
() be
arbitrary and > 0, then by denition there is a step function u (obviously bounded)
with |w u|
<

2
, and hence |w u
n
|
< for n sufciently large.

Let T be a compact subset of R
m
and the Lebesgue measure, then, as a conse-
quence of the theorem of StoneWeierstra (see e.g. [59]), for every function x con-
tinuous on T there is a sequence of polynomials with rational coefcients, converging
uniformly and hence also in L
() to x. Thus we obtain the following

Theorem 7.8.3. Let and be nite. Let T be a compact subset of R
m
and the
Lebesgue measure, then M
() is separable.
7.8.2 Bases
Denition 7.8.4. A (Schauder) basis (
i
) of a Banach space X is a sequence of
elements of X, such that for every x X there is a unique sequence (x
i
) of real
numbers with the property lim
n
|
n
i=1
x
i
i
x| = 0.
Denition 7.8.5. A basis
i
of a Banach space is called boundedly complete, if for
every number sequence (x
i
) with sup
n
|
n
i=1
x
i
i
| < there is an x X, such
that the series
i=1
x
i
i
converges to x.
We will now repeat the theorem of LindenstraussTsafriri (see 6.2.38) with those
characterizations relevant in this context:
Theorem 7.8.6. Let be a nite Young function. Then the following statements are
equivalent:
a) satises the
0
2
-condition.
b) The unit vectors form a boundedly complete basis of
.
c)
is separable.
Denition 7.8.7. The series
i=1
u
i
is called unconditionally convergent to x X,
if for every permutation p of the natural numbers the series
i=1
u
p(i)
converges to x.
Denition 7.8.8. A basis (
i
) of a Banach space X is called an unconditional basis
of X, if for every x X the series
i=1
x
i
i
converges unconditionally to x.
The following theorem can be found in [102]:
Theorem 7.8.9. Let be the Lebesgue measure on R
n
, and let L
() be reexive.
Let further
be an orthonormal wavelet basis with an r-regular multi-scale

analysis of L
2
(), then
is an unconditional basis of L
().
For the denition of an r-regular multi-scale analysis of L
2
() see [84].
7.9 Amemiya formula and Orlicz Norm
We will now concern ourselves with a representation of the Orlicz norm involving
only the Young function . This representation is based on the fact that conjugate
pairs of Young functions carry over to conjugate pairs of the corresponding modulars
(see Theorem 7.3.4).
Section 7.9 Amemiya formula and Orlicz Norm 237
From Youngs inequality for the Young functions and it follows for x L
()
and arbitrary > 0 (weak duality)
|x|

_
1 +f
_
x
__
,
since by denition |x|
= sup
_
T
xyd[ f
(y) 1 and due to Youngs inequality
_
x(t)
_
+(y(t))
x(t)
y(t),
hence
f
_
x
_
+f
(y)
_
T
x
yd,
and therefore
f
_
x
_
+
_
T
xyd for f
(y) 1.
The theorem of Amemiya states that even equality holds (strong duality), if on the
right-hand side the inmum is taken w.r.t. all positive
|x|
= inf
_
_
1 +f
_
x
__
> 0
_
.
We will nowprove this theoremusing Lagrange duality (if one considers the denition
of the Orlicz norm as an optimization problem under convex restrictions) and the fact
that conjugate Young functions correspond to conjugate modulars.
We dene, following Musielak (see [88]), the Amemiya norm on L
():
Denition 7.9.1. Let x L
(), then put A
(x) := inf(1 +f
(
x
)) [ > 0.
Theorem 7.9.2. A
: L
() R is a norm on L
() with A
(x) |x|
.
Proof. Apparently A
is nite on L
(), since by denition there is a > 0 with

f
(
x
) < . First of all we show the triangle inequality: let x, y L
(), let a > 0

arbitrary, then there are positive numbers u, v, such that
u
_
1 +f
_
x
u
__
< A
(x) +a and v
_
1 +f
_
y
v
__
< A
(y) +a.
As an immediate consequence we have
f
_
x
u
_
<
A
(x) +a
u
1 and f
_
y
v
_
<
A
(y) +a
v
1.
By the convexity of f
we then obtain
f
_
x +y
u +v
_
u
u +v
f
_
x
u
_
+
v
u +v
f
_
y
v
_
 0 arbitrary

A
(x +y) < A
(x) +A
(y) + 2a.
The positive homogeneity can be seen as follows: let > 0 and :=

, then
A
(x) = inf
>0
_
1 +f
_
x
__
= inf
>0
_
1 +f
_
x
__
.
For x = 0 apparently A
(x) = 0 holds. Let now x ,= 0, then from weak duality

apparently A
(x) |x|
> 0. The symmetry of A
follows immediately from the

symmetry of f
.
We are now in the position to show that Orlicz norm and Amemiya norm coincide.
Theorem 7.9.3.
|x|
= inf
>0
_
1 +f
_
x
__
= A
(x).
Proof. We have for x L
() because of (f
= f
by Theorem 7.3.4
A
(x) = inf
>0
_
1 +f
_
x
__
= inf
>0
_
1 + sup
yL
__
x
, y
_
f
(y)
__
= inf
>0
sup
yL
(x, y (f
(y) 1))
= sup
>0
inf
yL
(x, y +(f
(y) 1)).
The denition of the Orlicz norm yields
|x|
= supx, y [ f
(y) 1 = infx, y [ f
(y) 1.
Since the restriction set S
f
(1) has the interior point 0 (see Lemma 6.3.6), there is by
the theorem on Lagrange multipliers for Convex Optimization (see Theorem 3.14.4)
a non-negative
, such that
infx, y [ f
(y) 1 = inf
yL
x, y +
(f
(y) 1).
Section 7.9 Amemiya formula and Orlicz Norm 239
The theorem on the saddle point property of Lagrange multipliers 3.14.5 then yields
(Lagrange duality)
inf
yL
x, y +
(f
(y) 1) = sup
>0
inf
yL
x, y +(f
(y) 1).
Hence
|x|
= inf
>0
_
1 + (f
_
x
__
= A
(x).
Altogether we obtain the following representation for the Orlicz norm
|x|
= inf
>0
_
1 +
_
T
_
x
_
d
_
.
Remark 7.9.4. The mapping : R
>0
R with
() =
_
1 +
_
T
_
x
_
d
_
is convex.
Idea of proof. For differentiable one obtains using the monotonicity of the differ-
ence quotient and the theorem on monotone convergence
t
() = 1 +
_
T
_
_
x(t)
x(t)

t
_
x(t)
__
d(t)
= 1
_
T
t
_
x(t)
__
d(t),
where in the last part we have used Youngs equality. The last expression is, however,
monotonically increasing w.r.t. . If is not differentiable, one can approximate
by differentiable Young functions.
Example 7.9.5. (a)
|x|
1
= inf
>0
_
1 +
_
T
d
_
= min
0
+
_
T
[x[d =
_
T
[x[d.
(b)
|x|
= inf
>0
_
1 +
_
T
d
_
.
For |x|
we have
() = , for < |x|
we have
() = .
(c) 1 0
_
1 +
_
T
p
d
_
= inf
>0
_
+
1p
_
T
[x[
p
d
_
.
Now we have:
t
p
()=1 + (1 p)
p
_
T
[x[
p
d and hence
p
=((p 1)
_
T
[x[
p
d)
1
p
i.e.
|x|
p
= (
p
) =
p
|x|
(p)
,
where
p
= (p 1)
1
p
+ (p 1)
1p
p
=
p
1
p
(
p1
p
)
p1
p
.
We observe using lim
x0
xln x = 1
(a)
p
> 0,
(b) lim
p1
p
= 1,
(c) lim
p
p
= 1.
Chapter 8
Differentiability and Convexity in Orlicz Spaces
8.1 Flat Convexity and Weak Differentiability
For a continuous convex function f : X R the at convexity of the set x
X[ f(x) r is characterized by the differentiability properties of f.
For a continuous convex function f : X R (X a real normed space) the subdif-
ferential
f(x
0
) := X
[ (x x
0
) f(x) f(x
0
)
is a non-empty convex set (see Theorem 3.10.2). For f(x
0
) the graph of
[f(x
0
) + ( x
0
)] is a supporting hyperplane of the epigraph (x, s) X
R[ f(x) s in (x
0
, f(x
0
)), and each supporting hyperplane can be represented
in such a way.
The right-handed derivative f
t
+
(x
0
, x) := lim
t0
f(x
0
+tx)f(x
0
)
t
always exists and is
nite (see Corollary 3.3.2) and the equation f
t
(x
0
, x) = f
t
+
(x
0
, x) holds. Due to
the theorem of MoreauPschenitschnii (see Theorem 3.10.4) one obtains
f
t
+
(x
0
, x) = max
f(x
0
)
(x)
f
t
(x
0
, x) = min
f(x
0
)
(x).
A convex set (with non-empty interior) is called at convex, if every boundary point
has a unique supporting hyperplane.
Theorem 8.1.1. Let X be a real normed space and f : X R a continuous convex
function. Then for r > inff(x) [ x X the following statements are equivalent:
(a) The convex set S
f
(r) := x X[ f(x) r is at convex.
(b) For all boundary points x
0
of S
f
(r) one has
(i) [f
t
+
(x
0
, ) +f
t
(x
0
, )] X
(ii) there exists a c > 0 with

f
t
(x
0
, x) = cf
t
+
(x
0
, x)
for all x with f
t
+
(x
0
, x) 0.
In particular, S
f
(r) is at convex, if f Gteaux differentiable in x X[ f(x) = r.
242 Chapter 8 Differentiability and Convexity in Orlicz Spaces
Proof. Let f be not constant. Due to the continuity of f the set S
f
(r) has a non-empty
interior, and for every boundary point x
0
of S
f
(r) one has f(x
0
) = r > inf f(x),
hence from the denition we obtain 0 / f(x
0
).
(a) (b): Let H be the unique supporting hyperplane of S
f
(r) in x
0
. As for every
f(x
0
) also x
0
+ Ker is a supporting hyperplane we obtain
x
0
+ Ker = H.
If
0
X
represents the hyperplane H, then =

0
for a R. The theorem of
MoreauPschenitschnii yields
f(x
0
) =
0
[
1

2
and hence
f
t
+
(x
0
, x) =
2
0
(x)
f
t
(x
0
, x) =
1
0
(x)
_
for
0
(x) 0
and
f
t
+
(x
0
, x) =
1
0
(x)
f
t
(x
0
, x) =
2
0
(x)
_
for
0
(x) < 0.
We conclude (i)
[f
t
+
(x
0
, ) +f
t
(x
0
, )] = (
1
+
2
)
0
() X
.
It remains to verify (ii): as 0 / f(x
0
), the relation sign
1
= sign
2
,= 0 holds,
consequently
f
t
(x
0
, x) = (
1
1
2
)
sign
1
f
t
+
(x
0
, x)
for x with f
t
+
(x
0
, x) 0.
(b) (a): For f(x
0
) we have
f
t
+
(x
0
, x) (x) f
t
(x
0
, x) (8.1)
for all x X and because of (ii)
f
t
+
(x
0
, x) (x) cf
t
+
(x
0
, x) = f
t
(x
0
, x) (8.2)
for those x with f
t
+
(x
0
, x)0. From (x)=0 it follows because of (8.1) f
t
+
(x
0
, x)
0 and from (8.2)
f
t
+
(x
0
, x) = f
t
(x
0
, x) = 0,
i.e.
Ker Ker(f
t
+
+f
t
) := H
0
.
As f
t
+
(x
0
, ) f
t
(x
0
, ) and 0 / f(x
0
) it follows from (8.2) H
0
,= X and hence
Ker = H
0
.
Section 8.1 Flat Convexity and Weak Differentiability 243
Let now H be a supporting hyperplane of S
f
(r) in x
0
. Due to the theorem of
Mazur 3.9.6 the afne supporting set Hf(x
0
) of the epigraph of f in (x
0
, f(x
0
))
can be extended to a supporting hyperplane in X R , thus there is f(x
0
) with
H f(x
0
) (x, f(x
0
) +(x x
0
)) [ x X.
Hence
H x[ (x x
0
) = 0 = x
0
+ Ker = x
0
+H
0
,
consequently
H = x
0
+H
0
.
Corollary. Let S
f
(r) be at convex and f not Gteaux differentiable at x
0
. Then f
is at x
0
differentiable in direction x if and only if f
t
+
(x
0
, x) = 0.
Proof. If f
t
+
(x
0
, x) = 0, then using (ii) it follows that f
t
+
(x
0
, x) = f
t
(x
0
, x). Con-
versely, if f
t
+
(x
0
, x) = f
t
(x
0
, x), then, because of (i) we can assume f
t
+
(x
0
, x) 0
(exchange x by x if necessary). As c ,= 1, it follows from(ii) that f
t
+
(x
0
, x) = 0.
Theorem 8.1.2. If f is non-negative and positively homogeneous, then S
f
(r) is at
convex if and only if f is Gteaux differentiable in x X[ f(x) > 0.
Proof. If S
f
(r) is at convex for an r > 0, then, because of the positive homogeneity
of f, all level sets S
f
(r) are at convex. Now we have
f
t
+
(x
0
, x
0
) = lim
t0
f(x
0
+tx
0
) f(x
0
)
t
= f(x
0
)
= lim
t0
f(x
0
tx
0
) f(x
0
)
t
= f
t
(x
0
, x
0
).
Hence c = 1 in (ii) and thus f
t
+
(x
0
, x) = f
t
(x
0
, x) for f
t
+
(x
0
, x) 0.
Let now f
t
+
(x
0
, x) < 0, then
f
t
+
(x
0
, x) f
t
(x
0
, x) = f
t
+
(x
0
, x) > 0,
and with c = 1 in (ii) it follows that
f
t
+
(x
0
, x) = f
t
(x
0
, x).
Therefore we obtain using (i)
2f
t
+
(x
0
, x) = 2f
t
(x
0
, x) = (f
t
+
(x
0
, x) +f
t
(x
0
, x))
= f
t
+
(x
0
, x) +f
t
(x
0
, x),
hence f
t
+
(x
0
, x) = f
t
(x
0
, x).
If f is a norm, the above consideration yields the
Theorem 8.1.3 (Mazur). A normed space X is at convex if and only if the norm is
Gteaux differentiable in X 0.
Theorem 8.1.4. Let X be a normed space X whose norm is Gteaux differentiable in
X0 and let D : X0 X
denote the Gteaux derivative. Then |D(x)| = 1

and x, D(x) = |x| for all x X 0.
Proof. We have, due to the subgradient inequality for all h X
h, D(x) = x +h x, D(x) |x +h| |x| |h|
hence |D(x)| 1. On the other hand D(x), 0x |0||x|, hence D(x), x
|x|, and thus the assertion.
8.2 Flat Convexity and Gteaux Differentiability of Orlicz
Spaces
Let : R R be a nite Young function. Let (T, , ) be a -nite measure space,
and L
() the Orlicz space determined via and , equipped with the Luxemburg
norm | |
()
.
Let further M
() be the closed subspace of L
(), spanned by the step functions

in L
().
We consider the unit ball in M
(). If x M
(), then (see Theorem 6.2.24 and

Lemma 6.2.15)
|x|
()
= 1 if and only if
_
T
(x)d = 1, (8.3)
If satises the
2
-condition, then (see Theorem 6.2.42) M
() = L
().
A normed space is called at convex, if the closed unit ball is at convex.
According to Theorem 8.1.1 the level set S
f
(r) is at convex, if f is Gteaux
differentiable in x X[ f(x) = r.
Lemma 8.2.1. The function f
: M
() R dened by
f
(x) =
_
T
(x)d
is continuous and convex. Furthermore for x
0
M
()
(f
)
t
+
(x
0
, x) =
_
x>0
x
t
+
(x
0
)d +
_
x<0
x
t
(x
0
)d (8.4)
(f
)
t
(x
0
, x) =
_
x>0
x
t
(x
0
)d +
_
x<0
x
t
+
(x
0
)d.
Section 8.2 Flat Convexity and Gteaux Differentiability of Orlicz Spaces 245
If is differentiable, then f
is Gteaux differentiable and

(f
)
t
(x
0
, x) =
_
T
x
t
(x
0
)d. (8.5)
Proof. Convexity of f
follows immediately from the convexity of . As f
is
bounded on the unit ball, f
is continuous (see Theorem 6.3.9 and Theorem 6.2.24).

For the difference quotient we obtain
f
(x
0
+x) f
(x
0
)
=
_
T
(x
0
(t) +x(t)) (x
0
(t))
d
=
_
x(t)>0
(x
0
(t) +x(t)) (x
0
(t))
x(t)
x(t)d
+
_
x(t)<0
(x
0
(t) +x(t)) (x
0
(t))
x(t)
x(t)d.
Through the monotonicity w.r.t. of
(s
0
+s) (s
0
)
for all s
0
, s R, the representation (8.4) of (f
)
t
+
and (f
)
t
follows.
Let be differentiable. According to (8.4) (f
)
t
(x
0
, x) exists and we have (8.5).
Lemma 8.2.2. Let T
0
, T
1
and T
2
be disjoint sets with 0 < (T
i
) < (i =
0, 1, 2). If there exists an s
0
0 with
t
+
(s
0
) ,=
t
(s
0
) and (s
0
) (T
0
) < 1, then
M
() is not at convex.
Proof. The set of discontinuities of a function, monotonically increasing on [a, b], is
at most countable (see [89], S. 229). Hence there is s
1
> 0 with
t
+
(s
1
) =
t
(s
1
)
and
1 (s
0
)(T
0
) (s
1
)(T
1
) > 0.
As is continuous, one can choose s
2
R such that
(s
0
) (T
0
) +(s
1
) (T
1
) +(s
2
) (T
2
) = 1.
For the functions
x
0
= s
0
T
0
+s
1
T
1
+s
2
T
2
,
x
1
=
T
0
,
x
2
=
T
1
we have
0 < (f
)
t
+
(x
0
, x
1
) = (T
0
)
t
+
(s
0
) ,= (f
)
t
(x
0
, x
1
) = (T
0
)
t
(s
0
),
0 < (f
)
t
+
(x
0
, x
2
) = (T
1
)
t
+
(s
1
) = (T
1
)
t
(s
1
) = (f
)
t
(x
0
, x
2
).
Hence condition (ii) of Theorem 8.1.1 is not satised and thus M
() not at convex.
Theorem 8.2.3. Let be not purely atomic. Then M
() is at convex if and only if

is differentiable.
Proof. Lemma 8.2.2 and Theorem 8.2.1.
The derivative of the norm | |
()
is now determined as follows. Let x
0
M
()
and |x
0
|
()
= 1. The graph of the function x (f
)
t
(x
0
, x x
0
) + f
(x
0
) is a
supporting hyperplane of the epigraph of f
in (x
0
, f
(x
0
)) = (x
0
, 1) M
R.
This means that the hyperplane x M
[ (f
)
t
(x
0
, x x
0
) = 0 supports the unit
ball of M
in x
0
. If we denote | |
()
by p
then p
t
(x
0
, x
0
) = 1 (see Theorem 8.1.4)
and p
t
(x
0
, ) is a multiple of (f
)
t
(x
0
, ). Taking into account that
_
T

t
(x
0
)x
0
d
_
T
(x
0
)d = 1, we obtain that
x
_
x
t
(x
0
)d
_
x
0
t
(x
0
)d
, |x
0
|
()
= 1, (8.6)
is the derivative of the norm | |
()
(in M
).
If the measure is purely atomic, then the differentiability of is not a necessary
condition for at convexity of M
().
In the sequel let
S = s R[ in s not differentiable.
Then we have
Theorem 8.2.4. Let (T, , ) be purely atomic and consist of more than 2 atoms.
Then M
() is at convex if and only if for all s S and all atoms A

(s) (A) 1.
Proof. Necessity follows immediately from Lemma 8.2.2.
Let x
0
M
() and |x
0
|
()
= 1. According to Lemma 8.2.1 and the condition
(s) (A) 1 it is sufcient to consider x
0
= r
A
for an atom A and r S.
As (r) (A) = 1 we have
t
+
(r) r (r) > 0 and
t
(r) r (r) > 0. In

particular sign(
t
+
(r)) = sign(
t
+
(r)) = sign(r).
Section 8.3 A-differentiability and B-convexity 247
Since 0 / S and
t
(0) = 0 it follows from Lemma 8.2.1 that
(f
)
t
+
(x
0
, x) =
_
x>0
x
t
+
(x
0
)d +
_
x<0
x
t
(x
0
)d
=
t
+
(r)
_
x>0
x
A
d +
t
(r)
_
x<0
x
A
d,
and in the same way
(f
)
t
(x
0
, x) =
t
(r)
_
x>0
x
A
d +
t
+
(r)
_
x<0
x
A
d.
Hence we obtain (i) with f
t
+
(x
0
, x) +f
t
(x
0
, x) = (A)[
t
+
(r) +
t
(r)]x[
A
, where
A an atom, i.e. x constant on A. This implies
x f
t
+
(x
0
, x) +f
t
(x
0
, x) = (A)[
t
+
(r) +
t
(r)]x[
A
(M
.
Concerning (ii): If x > 0 on A then
_
x<0
x
A
d = 0 and hence
f
t
+
(x
0
, x) =
t
+
(r)
_
x>0
x
A
d.
If
t
+
(r) > 0 then f
t
+
(x
0
, x) 0 and we have f
t
(x
0
, x) =

(r)
+
(r)
f
t
+
(x
0
, x). If,
however, x < 0 on A then
f
t
+
(x
0
, x) =
t
(r)
_
x<0
x
A
d.
If
t
(r) < 0 then f

t
+
(x
0
, x) 0 and f
t
(x
0
, x) =

+
(r)
(r)
f
t
+
(x
0
, x).
From Theorem 8.1.1 it then follows that M
() is at convex.
8.3 A-differentiability and B-convexity
In this section we will consider the duality of A-differentiability and B-convexity of a
conjugate pair of convex functions. In this context B is the polar of A. These notions
were introduced by Asplund and Rockafellar in [4]. Our main interest is focused on
the duality of Frchet differentiability and K(0, 1)-convexity. The results of this sec-
tion will play an important role for the substantiation of the duality of Frchet differ-
entiability and local uniform convexity (see Theorem 8.4.10) and as we will explain
in detail in the next section for the strong solvability of optimization problems.
The exposition of this section is to a large extent based on that in [4].
Let in the sequel X and Y be normed spaces and either Y = X
or X = Y
and
let f : X R and g : Y R mutually conjugate, continuous, and convex functions.
Lemma 8.3.1. S
f
(r) is bounded for every r R.
Proof. Let y Y , then
g(y) supx, y f(x) [ x S
f
(r) supx, y r [ x S
f
(r),
hence
supx, y [ x S
f
(r) g(y) +r < ,
and similarly
infx, y [ x S
f
(r) g(y) r > .
This means S
f
(r) is weakly bounded and therefore, due to Theorem 5.3.15, bounded.
Lemma 8.3.2. Let the interior of S
f
(0) be non-empty and let h : Y R be the
support functional of S
f
(0), i.e.
h(v) := supz, v [ f(z) 0
then
h(v) = min
_
g
_
v
> 0
_
holds.
Proof. By the previous theorem S
f
(0) is bounded and hence h nite. Apparently
h(v) = infz, v [ f(z) 0 holds. Let now for > 0
() := infz, v +f(z) [ z X = inf(z, v f(z)) [ z X.
Then
() = supz, vf(z) [ z X = sup
__
z,
v
_
f(z)
z X
_
= g
_
v
_
holds. According to the theorem on Lagrange duality (see Theorem 3.14.5) we then
obtain
h(v) = max() [ > 0 = min() [ > 0 = min
_
g
_
v
> 0
_
.
Denition 8.3.3. Let C X, then we dene the polar C
0
of C by
C
0
:= v Y [ u, v 1, u C.
The bipolar of C is the dened as follows:
C
00
:= u X[ u, v 1, v C
0
.
Then the following theorem holds (see [23], II, p. 51):
Theorem 8.3.4 (Bipolar Theorem). Let X be a normed space and either Y = X
or
X = Y
and let C X. Then

C
00
= conv(C 0)
holds, where the closure has to be taken w.r.t. the weak topology.
Remark 8.3.5. If C is convex, weakly closed, and if C contains the origin, then
C
00
= C
holds. By Theorem 3.9.18 a closed convex subset of a normed space is always weakly
closed.
The following lemma is of central importance for the subsequent discussion:
Lemma 8.3.6. Let f and g be continuous mutually conjugate convex functions on X
resp. on Y and let for x X and y Y Youngs equality be satised
x, y f(x) g(y) = 0,
then for every > 0
v [ g(y +v) g(y) x, v
0

1
u[ f(x +u) f(x) u, x
2v [ g(y +v) g(y) x, v
0
.
Proof. Let f
(z) := (f(x +z) f(x) z, y )

1
and g
(v) := (g(y +v)

g(y) x, v +)
1
. Then f
and g
are mutual conjugates, because

f
(v) = supz, v f
(z) [ z X
= supz, v (f(x +z) f(x) z, y )
1
[ z X
= supz, y +v (f(x +z) f(x) )
1
[ z X
= supz, y +v (f(x +z) +g(y) x, y )
1
[ z X
=
1
supx +z, y +v f(x +z) g(y) x, v + [ z X
=
1
(g(y +v) g(y) x, v +) = g
(v).
In the same manner it can be shown that: f
= g
.
Using the subgradient inequality it follows that: 1 = inf g
= g
(0) and 1 =
f
(0) = inf f
.
It remains to be shown
S
g
(2)
0
C
:= S
f
(0) 2 S
g
(2)
0
.
Here C
is a convex, closed set, which due to f
(0) = 1 contains the origin as

an interior point (thus C
is also weakly closed by Theorem 3.9.18). The bipolar

theorem 8.3.4 together with Remark 8.3.5 then implies C
00
= C
. The assertion then

follows if we can show
S
g
(2) C
0

1
2
S
g
(2) ()
Let now h be the support functional of C
, i.e.
h(v) = supz, v [ z C
,
then by Lemma 8.3.2
h(v) = inf
_
g
_
v
> 0
_
holds, since C
is bounded due to the niteness of the conjugate g
.
In particular: h(v) g
(v), hence
S
g
(2) S
h
(2) = 2 S
h
(1) = 2C
0
,
and thus the right-hand side of ().
We now prove the left hand side of () and show at rst
v[h(v) < 1 S
g
(2).
Let v Y such that h(v) < 1, then there is a > 0 with
g
(
1
v) < 1.
On the other hand: g
(
1
v) 1 = inf g
and hence: 0 < < 1. By the convexity

of g
we then obtain
g
(v) (1 )g
(0) +g
(
1
v) < 2.
As mentioned above, v [ h(v) 1 = C
0
. Let h(v) = 1,
n
(0, 1) with
n
0 and v
n
:= (1
n
)v, then since 0 v [ h(v) < 1, since h(0) = 0, and
the convexity of h also v
n
v [ h(v) < 1. Therefore g
(v
n
) < 2, and, since by
construction v
n
v, we obtain g
(v) 2, due to the continuity of g
.
Denition 8.3.7. Let A be a non-empty subset of X, f : X R, then f is called
A-differentiable at x X, if there is a y Y , such that
lim
0
sup
uA
f(x +u) f(x)
u, y
= 0.
Then y is called the A-gradient of f at x.
Let g : Y R and B subset of Y . We say g is B-convex at y Y w.r.t. x X, if
for every > 0 there is a > 0, such that
v [ g(y +v) g(y) x, v B.
Theorem 8.3.8. Let f and g be nite and mutually conjugate convex functions on X
resp. on Y . Let A be non-empty subsets of X and B the polar of A in Y . Let x X
and y f(x), then y is the A-gradient of f at x if and only if g is B-convex at y
w.r.t. x.
Proof. Since y f(x) then y is an A-gradient at x, if and only if for every > 0
there is a > 0, such that
sup
uA
_
f(x +u) f(x)
u, y
_
, 0 < .
We obtain
1
A
1
u[ f(x +u) f(x) u, y
=
1
1
z [ f(x +z) f(x) z, y = C
.
Lemma 8.3.6 implies
C
2v [ g(y +v) g(y) x, v

0
,
hence
1
2
1
A v [ g(y +v) g(y) x, v
0
,
and therefore
2B v [ g(y +v) g(y) x, v .
If we put := /2 then we obtain: g is B-convex at y w.r.t. x.
Conversely, let
B v [ g(y +v) g(y) x, v ,
then
1
A v [ g(y +v) g(y) x, v
0
,
and by Lemma 8.3.6 then also
1
A C
=
1
z [ f(x +z) f(x) z, y .
Therefore for all u A
f(x +
1
u) f(x)
1
u, y .
If we choose for A the unit ball in X, then B is the unit ball in Y and A-differen-
tiability of f coincides with its Frchet differentiability. Thereby we obtain
Corollary 8.3.9. y is Frchet gradient of f at x g is K(0, 1)-convex at y w.r.t. x,
if K(0, 1) denotes the unit ball in Y .
The subsequent theorem will draw a connection between K(0, 1)-convexity and
strong solvability according to
Denition 8.3.10. A function f has a strong minimum k
0
on a subset K of a Banach
space X, if the set of minimal solutions M(f, K) of f on K only consists of k
0
and if for every sequence (k

n
)
nN
K with
lim
n
f(k
n
) = f(k
0
) we have lim
n
k
n
= k
0
.
The problem of minimizing f on K is then called strongly solvable.
Theorem 8.3.11. g is K(0, 1)-convex at y w.r.t. x, if and only if y is strong global
minimum of g x, .
Proof. Let g be K(0, 1)-convex at y w.r.t. x. Apparently for all > 0 and all z Y
with |z| >
g(y +z) g(y) > x, z,
and hence x g(y). By the subgradient inequality we obtain for all z Y
g(z) x, z g(y) x, y,
i.e. y is a global minimum of g x, . Let now
g(y +v
n
) x, y +v
n
n
g(y) x, y,
hence
g(y +v
n
) x, v
n
g(y) 0.
Suppose there is a subsequence (v
n
k
) and a > 0 with |v
n
k
| , then for all > 0
there is n
k
with
v
n
k
v [ g(y +v) g(y) x, v ,
in contradiction to the K(0, 1)-convexity of g.
Conversely let y be a strong global minimum of g x, , i.e.
g(y +v
n
) g(y) x, v
n
0
implies v
n
0. Since apparently x g(y), the K(0, 1)-convexity of g at y w.r.t. x
follows.
Section 8.4 Local Uniform Convexity, Strong Solvability and Frchet Differentiability 253
As a corollary of Theorem 8.3.8 we obtain
Corollary 8.3.12. Let f be Gteaux differentiable at x and let y f(x), then g is
H
u
-convex at y w.r.t. x, for all half-spaces H
u
:= v Y [ u, v 1, with u X
arbitrary.
Proof. f is A
u
-differentiable w.r.t. every one-point subset A
u
:= u of X. Then
A
0
u
= H
u
= v [ u, v 1
holds.
We thus obtain a duality relation between Gteaux differentiability and strict con-
vexity of the conjugate.
Theorem 8.3.13. Let Y be a normed space and X = Y
and let f be Gteaux

differentiable, then g is strictly convex.
Proof. Suppose g is not strictly convex, then there are y, v Y with v ,= 0 and
x g(y), such that for all [1, 1]
g(y +v) g(y) x, v = 0
holds and hence for all > 0
[v, v] z [ g(y +z) g(y) x, z .
According to the theorem of HahnBanach there is a u X with |u| = 1 and
u, v = |v|, but then [v, v] is not contained in H
u
for < |v|, a contradiction to
Corollary 8.3.12.
8.4 Local Uniform Convexity, Strong Solvability and
Frchet Differentiability of the Conjugate
If one wants to generalize the notion of uniform convexity of functions (see [76]) by
allowing for different convexity modules for different points of the space, there are
different possibilities to do so.
Denition 8.4.1. A monotonically increasing function : R
+
R
+
with (0) = 0,
(s) > 0 for s > 0 and
(s)
s

s
is called a convexity module. A continuous
function f : X R, where X is a normed space, is called locally uniformly convex
by denition
a) if for all x X a convexity module
x
exists, such that for all y X we have
1
2
(f(y) +f(x)) f
_
x +y
2
_
+
x
(|x y|),
b) if for all x X and all x
f(x) a convexity module

x,x
exists, such that
for all y X we have
f(y) f(x) y x, x
+
x,x
(|x y|),
c) if for all x X a convexity module
x
exists, such that for all y X we have
1
2
(f(x +y) +f(x y)) f(x) +
x
(|y|).
It is easily seen that: a) b) and b) c); if the function f satises property a),
one obtains using the subgradient inequality
1
2
(f(x) +f(y)) f
_
x +y
2
_
+
x
(|x y|)
f(x) +
_
x +y
2
x, x
_
+
x
(|x y|).
Dening
x,x
(s) :=
x
(2s) then immediately property b) follows. If a function f
satises property b), then again by the subgradient inequality
1
2
f(x +y)
1
2
(y, x
+f(x) +
x,x
(|y|))
1
2
(f(x) f(x y) +f(x) +
x,x
(|y|)),
and hence property c) for
x
(s) :=
1
2
x,x
(s).
The converse, however is not true, because the function f(x) = e
x
satises prop-
erty c) using the convexity module
x
(s) := e
x
(cosh(s) 1), but does not have
bounded level sets, and hence cannot satisfy b) (see below).
The strictly convex function f(x) := (x + 1) log(x + 1) x satises property b),
because for h > 0 we obtain due to the strict convexity of f
0 <
1
h
(f(x +h) f(x) f
t
(x) h)
=
1
h
_
(x + 1) log
x +h + 1
x + 1
+h
_
log
x +h + 1
x + 1
1
__
h
,
but violates a) at the origin, because
lim
y
1
y
_
1
2
f(y) f
_
y
2
__
=
1
2
log 2 < .
In the sequel we will mean by locally uniformly convex functions always those with
property b).
Remark. Lovaglia in [78] investigates locally uniformly convex norms. The notion
introduced there differs from the one we consider here, the squares of these norms
are, however, locally uniformly convex in the sense of b) (see [56]).
Remark 8.4.2. If f : X R is locally uniformly convex then
f(y)
|y|

|y|
. In
particular, the level sets of f are bounded.
Proof. We have for all x X and all x
f(x) and all y X

f(y) f(x) y x, x
+
x,x
(|x y|).
Let in particular be x = 0 and x
f(0) then
f(y)
|y|

f(0)
|y|
+
_
y
|y|
, x
_
+

0,x
(|y|)
|y|
f(0)
|y|
|x
| +

0,x
(|y|)
|y|
|y|
.
Denition 8.4.3. A function f has a strong minimum k
0
on a subset K of a Banach
space X, if the set of minimal solutions M(f, K) of f on K only consists of k
0
and if for every sequence (k

n
)
nN
K with
lim
n
f(k
n
) = f(k
0
) we have: lim
n
k
n
= k
0
.
The problem of minimizing f on K is then called strongly solvable.
Lemma 8.4.4. Let X be a reexive Banach space and let f : X R be convex and
continuous then the following statements are equivalent:
(a) f has a strong minimum on every closed hyperplane and on X
(b) f has a strong minimum on every closed half-space and on X
(c) f has a strong minimum on every closed convex subset of X.
Proof. Without limiting generality we can assume that f(0) = 0 and f(x) > 0 for
x ,= 0. Otherwise we can consider g(x) := f(x
0
x) f(x
0
), where f(x
0
) =
minf(x) [ x X.
(a) (b): Let G
:= x[ x
0
, x . If 0 G
, then g
0
= 0 is the strong
minimum of f on G
. If 0 / G
, then it follows that ,= 0. Let x be an interior point

of G
, then f(x) > 0 and with x
:= (1 ) 0 + x for (0, 1) it follows that

f(x
) f(x) < f(x). Hence x cannot be a minimum of f on G
. Let g
0
be the
strong minimum of f on H
:= x[ x
0
, x = . Then according to the reasoning
above f(g
0
) = inff(g) [ g G
. Let now f(g

n
) f(g
0
) with g
n
G
.
Then x
0
, g
n
, because assume there is > 0 and a subsequence x
0
, g
n
k

+. Then g
n
k
G
+
. But we have for every > 0
minf(x) [ x
0
, x + > f(g
0
).
For let f(g
) = inff(g) [ g G
+
, then we obtain in the same way as above
x
0
, g
= + . As 0 is strong minimum of f on X, the mapping f(g
)
is strictly monotonically increasing on [0, 1]. The mapping () = x
0
, g
is continuous there and we have: (0) = 0 and (1) = + . Hence there is a
(0, 1) with (
) = , i.e. g
:=
. Therefore we obtain
f(g
0
) f(g
) < f(g
) = minf(x) [ x
0
, x +,
hence f(g
n
k
) f(g
), a contradiction, which establishes x
0
, g
n
.
Since g
n
G
we have

x
0
,g
n
)
1. This implies
f
_

x
0
, g
n
g
n
_
0
, g
n
f(g
n
) f(g
0
).
Moreover we have
0
, g
n
g
n
x[ x
0
, x = ,
and hence f(g
0
) f(

x
0
,g
n
)
g
n
). We conclude that (

x
0
,g
n
)
g
n
) is a minimizing se-
quence of f on H
, and hence
0
, g
n
g
n
g
0
,
thus g
n
g
0
.
(b) (c): Let K be convex and closed, let 0 / K and let r := inf f(K), then
r > 0 and the interior of S
f
(r) is non-empty. According to the Separation Theorem of
Eidelheit 3.9.14 there is a half space G
with K G
and G
Int(S
f
(r)) = . Let
g
0
be the strong minimum of f on G
, then f(g
0
) r, but f(g
0
) < r is impossible,
because in that case g
0
would belong to the interior of S
f
(r). Let (k
n
) be a sequence
in K with f(k
n
) r = inf(K). Then f(g
0
) = r since K G
, and (k
n
) is
a minimizing sequence for f on G
, hence k
n
g
0
. But K is closed, therefore
g
0
K. Thus g
0
is minimal solution of f on K and because of K G
also strong
minimum.
If 0 K and (k
n
) a sequence in K with f(k
n
) 0, we can assume that K is a
subset of a half space G. The point 0 is minimum and hence strong minimum of f on
G and therefore also strong minimum of f on K.
Denition 8.4.5. A convex function is called bounded, if the image of every bounded
set is bounded.
In order to establish a relation between strong solvability and local uniform con-
vexity, we need the subsequent
Lemma 8.4.6. Let X be a reexive Banach space and f : X R convex and
continuous.
Then
f(x)
|x|

|x|
holds, if and only if the convex conjugate f
is bounded.
Proof. Let f
be bounded. Suppose there is a sequence (x

n
)
n=1
with |x
n
|
n
for which the sequence of expressions
f(x
n
)
|x
n
|
< M for some M R, then there is
a sequence
(x
n
)
n=1
X
with |x
n
| = 1 and x
n
, x
n
= |x
n
|.
Then one obtains
f
(2Mx
n
) = sup
xX
2Mx
n
, x f(x)
|x
n
|
_
2M
_
x
n
,
x
n
|x
n
|
_
f(x
n
)
|x
n
|
_
M|x
n
|,
and because of the boundedness of f
a contradiction.
Conversely let |x
| r, then there is R, such that

f(x)
|x|
r for |x| .
Therefore
f
(x
) = sup
xX
x, x
f(x) sup
xX
__
r
f(x)
|x|
_
|x|
_
sup
|x|
__
r
f(x)
|x|
_
|x|
_
+ sup
|x|
__
r
f(x)
|x|
_
|x|
_
sup
|x|
r|x| f(x) r inf f(x[ |x| ).
In order to estimate inf f(x[ |x| ) from above, let nally x
0
f(0), then
f(x) f(0) |x||x
0
| f(0) |x
0
| and thus
inf f(x[ |x| ) |x
0
| f(0).
Corollary 8.4.7. Let f : X R be locally uniformly convex then f
is bounded.
Proof. Remark 8.4.2 and the previous lemma.
Lemma 8.4.8. Let X be a reexive Banach space, f : X R continuous and
convex, and let f
be bounded, then for every closed convex set K the set of minimal
solutions M(f, K) ,= .
Proof. As f
is bounded, then according to Lemma 8.4.6 in particular all level sets

of f are bounded and hence due to the theoremof MazurSchauder 3.13.5 M(f, K) ,=
for an arbitrary closed convex set K.
Lemma 8.4.9. Let f be bounded, then f is Lipschitz continuous on bounded sets.
Proof. Let B be the unit ball in X, let > 0, and let S be a bounded subset of
X. Then also S + B is bounded. Let
1
and
2
be lower and upper bounds of
f on S + B. Let x, y S with x ,= y and let :=
|yx|
+|yx|
. Let further z :=
y +

|yx|
(y x), then z S +B and y = (1 )x + z. From the convexity of
f we obtain
f(y) (1 )f(x) +f(z) = f(x) +(f(z) f(x)),
and hence
f(y) f(x) (
2
1
) =

2

1
+|y x|
|y x| <

2

1
|y x|.
If one exchanges the roles of x and y, one obtains the Lipschitz continuity f on S.
The subsequent theorem provides a connection between strong solvability and
Frchet differentiability.
Theorem 8.4.10. Let X be a reexive Banach space and f : X R a strictly
convex and bounded function, whose conjugate f
is also bounded. Then f has a

strong minimum on every closed convex set, if and only if f
is Frchet differentiable.
Proof. Let f
be Frchet differentiable. According to Lemma 8.4.4 it sufces to show

that f has a strong minimumon every closed hyperplane and on X. By Corollary 8.3.9
and Theorem 8.3.11 f
is Frchet differentiable at x
, if and only if the function

f x
, has a strong minimum on X. As f
is differentiable at 0, f has a strong

minimum on X.
Let H := x[ x
0
, x = r with x
0
,= 0. By Lemma 8.4.8 we have M(f, H) ,= .
Let h
0
M(f, H). According to Theorem 3.10.5 there is a x
1
f(h
0
) with
x
1
, h h
0
= 0 for all h H. Due to the subgradient inequality we obtain x
1
, x
h
0
f(x) f(h
0
) for all x X. Setting f
1
:= f x
1
, , it follows that f
1
(x)
f
1
(h
0
) for all x X. As f
is differentiable at x
1
, h
0
is the strong minimum of f
1
on
X. In particular h
0
is also the strong minimum of f on H, due to
f[
H
= (f
1
+x
1
, h
0
)[
H
,
because for all h H we have
f
1
(h) +x
1
, h
0
= f(h) x
1
, h +x
1
, h
0
= f(h) x
1
, h h
0
= f(h).
In order to prove the converse let x
0
X
. According to Corollary 8.3.9 and

Theorem 8.3.11 we have to show that f
1
:= f x
0
, has a strong minimum on X.
As f
1
(x
) = f
(x
0
+ x
) for all x
, apparently f
1
is bounded and hence f
1
has, according to Lemma 8.4.8, a minimum x
0
on X. If x
0
= 0, then f
1
= f and
hence x
0
is the strong minimum of f
1
on X.
Let x
0
,= 0 and let f
1
(x
n
) f
1
(x
0
).
Let further be > 0, let K
1
:= x X[ x
0
, x x
0
, x
0
+ and let K
2
:=
x X[ x
0
, x x
0
, x
0
then we obtain because of the strict convexity of f
1
minmin(f
1
, K
1
), min(f
1
, K
2
) > f
1
(x
0
).
It follows x
0
, x
n
x
0
, x
0
, because otherwise there is a subsequence (x
n
k
) in K
1
or K
2
, contradicting f
1
(x
n
) f
1
(x
0
).
Then for H := h[ x
0
, x
0
= x
0
, h we have according to the Formula of Ascoli
(see Application 3.12.5)
min|x
n
h| [ h H =
[x
0
, x
n
x
0
, x
0
[
|x
0
|
0.
For h
n
M(|x
n
|, H) we conclude: x
n
h
n
0. The level sets of f
1
are
bounded due to Lemma 8.4.6, hence also the sequences (x
n
) and (h
n
).
As f is, according to Lemma 8.4.9, Lipschitz continuous on bounded sets, there is
a constant L with
[f(x
n
) f(h
n
)[ L|x
n
h
n
| 0.
We obtain
f
1
(h
n
) = f(h
n
) x
0
, h
n
= (f(h
n
) f(x
n
)) + (f(x
n
) x
0
, x
n
) x
0
, h
n
x
n
f
1
(x
0
).
On H the functions f and f
1
differ only by a constant. As f has a strong minimum
there, this also holds for f
1
, hence h
n
x
0
and thus x
n
x
0
.
Theorem 8.4.11. Let X be a reexive Banach space and let f be a continuous con-
vex function on X. Let further f be locally uniformly convex. Then f has a strong
minimum on every closed convex set.
Proof. Let f be locally uniformly convex. As f
is bounded (see Corollary 8.4.7),

then due to Lemma 8.4.6 all level sets of f are bounded and hence according to
Lemma 8.4.8 M(f, K) ,= for an arbitrary closed convex set K. Let now k
0

M(f, K), then by Theorems 3.4.3 and 3.10.4 there is
x
0
f(k
0
) with k k
0
, x
0
0 for all k K.
If is the convexity module of f belonging to k
0
and x
0
, then for an arbitrary mini-
mizing sequence (k
n
) we have
f(k
n
) f(k
0
) k
n
k
0
, x
0
+(|k
n
k
0
|) (|k
n
k
0
|),
thus lim
n
k
n
= k
0
.
Strong solvability is inherited, if one adds a continuous convex function to a locally
uniformly convex function.
Corollary 8.4.12. Let X be a reexive Banach space and let f and g be continuous
convex functions on X. Let further f be locally uniformly convex. Then f + g is
locally uniformly convex and has a strong minimum on every closed convex set.
Proof. First we show: f +g is locally uniformly convex: let x X and x
f
f(x),
then there is a convexity module
x,x
f
, such that for all y X
f(y) f(x) y x, x
f
+
x,x
f
(|x y|).
If x
g
g(x), then we obtain using the subgradient inequality
f(y) +g(y) (f(x) +g(x)) y x, x
f
+x
g
+
x,x
f
(|x y|).
Remark 8.4.13. Let X be a reexive Banach space and let f and g be continuous
convex functions on X, and let f
be bounded. Then (f +g)
is bounded.
Proof. This is according to Theorem 8.4.6 the case if and only if
f(x)+g(x)
|x|

|x|
. Now we have for x
g
g(0): g(x) g(0) x 0, x
g
|x||x
g
|, hence
g(x)
|x|

g(0)
|x|
|x
g
| c R for |x| r > 0. Therefore we conclude
f(x) +g(x)
|x|

f(x)
|x|
+c
|x|
.
In [56] the main content of the subsequent theorem is proved for f
2
, without re-
quiring the boundedness of f
. For the equivalence of strong solvability and local

uniform convexity here we need in addition the boundedness of f.
Theorem 8.4.14. Let X be a reexive Banach space, and let f : X R be a
bounded, strictly convex function.
Then f
is bounded and f has a strong minimum on every closed convex set, if and
only if f is locally uniformly convex.
Proof. If f is locally uniformly convex, then the boundedness of f
follows from
Corollary 8.4.7 and strong solvability from Theorem 8.4.11.
Conversely, let f have the strong minimum property, then by Theorem 8.4.10 f
is Frchet differentiable. If x
0
is the Frchet gradient of f
at x
0
, then f() , x
has, according to Corollary 8.3.9 and Theorem 8.3.11 the global strong minimum at
x
0
. Let
(s) := inf
|y|=s
f(x
0
+y) f(x
0
) y, x
0
.
Then (0) = 0 and (s) > 0 for s > 0. Furthermore is monotonically increasing,
because let s
2
> s
1
> 0 and let |z| = s
1
. Dene y :=
s
2
s
1
z then we obtain, due to the
monotonicity of the difference quotient
f(x
0
+
s
1
s
2
y) f(x
0
)
s
1
s
2
f(x
0
+y) f(x
0
),
and hence
f(x
0
+z) f(x
0
) z, x
0

s
1
s
2
(f(x
0
+y) f(x
0
) y, x
0
)
f(x
0
+y) f(x
0
) y, x
0
.
Finally we conclude using Theorem 8.4.6
(s)
s
=
f(x
0
)
s
+ inf
|y|=s
_
f(x
0
+y)
|y|

_
y
|y|
, x
0
__

f(x
0
)
s
|x
0
| + inf
|y|=s
_
f(x
0
+y)
|y|
_
s
.
The following example shows, that on a reexive Banach space the conjugate of a
bounded convex function does not have to be bounded.
Example 8.4.15. Let f : l
2
R be dened by f(x) =
i=1
i
(x
(i)
), where
i
:
R R is given by
i
(s) :=
_
s
2
2
for [s[ 1
i
i+1
[s[
i+1
i
+
1i
2(i+1)
for [s[ > 1.
f is bounded, because f(x) |x|
2
2
for all x l
2
. The conjugate function of f is
f
(x) =
i=1
i
(x
(i)
) with
i
(s) :=
_
s
2
2
for [s[ 1
[s[
i+1
i+1
+
i1
2(i+1)
for [s[ > 1.
Let e
i
denote the i-th unit vector in l
2
, then we obtain
f
(2e
i
) =
2
i+1
i + 1
+
i 1
2(i + 1)
i
.
f
is continuous on l
2
and Gteaux differentiable.
8.4.1 E-spaces
For a particular class of Banach spaces, the so-called E-spaces, all convex norm-
minimization problems are strongly solvable.
Denition 8.4.16. Let (, d) be a metric space and
0
a subset of .
0
is called
approximatively compact, if for each x every minimizing sequence in
0
(i.e.
every sequence (x
n
)
0
, for which d(x, x
n
) d(x,
0
) holds) has a point of
accumulation in
0
.
Denition 8.4.17. A Banach space X is an E-space, if X is strictly convex and every
weakly closed set is approximatively compact.
Such spaces were introduced by Fan and Glicksberg [29].
The following theorem can be found in [42]:
Theorem 8.4.18. Let X be a Banach space and let S(X) denote its unit sphere.
Then X is an E-space, if and only if X is reexive and strictly convex and from
x
n
, x S(X) with x
n
x it follows that |x
n
x| 0.
Proof. Let K be a weakly closed subset of the reexive Banach space X and let
x X K. Let further be (x
n
) K a minimizing sequence, more precisely
|x x
n
| d(x, K) = d.
Then d > 0 holds, because suppose d = 0, then x
n
x and hence x
n
, y
x, y
for all y
, hence x
n
x K, a contradiction to K being weakly closed.
Since (x
n
) is bounded, there is by the theorem of EberleinShmulian (see [28]) a
weakly convergent subsequence (x
n
k
) with x
n
k
x
0
K. It follows that
x
n
k
x
|x
n
k
x|

x
0
x
d
.
Apparently |x
0
x| d, hence |x
0
x| = d, for suppose d < |x
0
x|, then
x
0
x
d
/ U(X) := y X[ |y| 1, in contradiction to U(X) being weakly closed.
Hence by assumption x
n
k
x x
0
x, i.e. x
n
k
x
0
which means x
0
is a point of
accumulation of the minimizing sequence.
Conversely, let X be an E-space. First of all we show the reexivity: let x

S(X
) arbitrary and let H := y X[ y, x
= 0. H is weakly closed and hence

approximatively compact. Let x X H and (x
n
) H with |x
n
x| d(x, H).
By assumption there exists a subsequence (x
n
k
) with x
n
k
x
0
H. Therefore
|x
n
k
x| |x
0
x| = d(x, H). Hence, because of the strict convexity of X, x
0
is
the best approximation of x w.r.t. H. According to the Theorem of Singer 3.12.6 then
[x x
0
, x
[ = |x x
0
|
holds. Therefore x
attains the supremum on the unit ball of X. Since x
was arbi-
trarily chosen, X is according to the theorem of James [45] reexive.
Let now (x
n
) S(X) with x
n
x S(X). We have to show: x
n
x.
We choose x
S(X
) with x, x
= 1. Then
x
n
, x
x, x
= 1, (8.7)
and x
n
, x
> 0 for all n > N (N N suitably chosen). Let now x

n
:=
x
n
x
n
,x
)
,
then
( x
n
) H
1
:= z X[ z, x
= 1.
The Formula of Ascoli 3.12.5 yields d(0, H
1
) = [0, x
1[ = 1. Furthermore, we
obtain
| x
n
| =
_
_
_
_
x
n
x
n
, x
_
_
_
_
=
1
[x
n
, x
[
1,
i.e. | x
n
0| d(0, H
1
). Hence ( x
n
) is a minimizing sequence for the approximation
of 0 w.r.t. H
1
. Since H
1
is approximatively compact there exists a subsequence ( x
n
k
)
with x
n
k
x H
1
. But by assumption and (8.7) x
n
k
x and hence x = x. Again
by (8.7): x
n
k
x. Since every subsequence of ( x
n
) is a minimizing sequence and
hence contains a to x convergent subsequence, we altogether obtain x
n
x.
The properties of an E-space are closely related to the KadecKlee property
(see [104]):
Denition 8.4.19. A Banach space X has the KadecKlee property, if x
n
x and
|x
n
| |x| already imply |x
n
x| 0.
Therefore, a strictly convex and reexive Banach space with KadecKlee property
is an E-space. Conversely, it is easily seen that an E-space has the KadecKlee prop-
erty.
But the E-space property can also be characterized by Frchet differentiability of
the dual space. In order to substantiate this we need the following
Lemma 8.4.20 (Shmulian). The norm of a normed space X is Frchet differentiable
at x S(X), if and only if every sequence (x
n
)
nN
U(X
) with x, x
n
1 is
convergent.
Proof. Let x
S(X
) be the Frchet derivative of the norm at x. We will show

now: each sequence (x
n
) U(X
) with x, x
n
1 converges to x
.
Since 1 |x
n
| x, x
n
1 it follows that |x
n
| 1. Let x
n
:=
x
n
|x
n
|
.
Suppose the sequence ( x
n
) does not converge to x
, then there is an > 0 and a

sequence (z
n
) S(X) with z
n
, x
n
x
2. Put x
n
:=
1
(|x|x, x
n
)z
n
, then
x, x
n
= x, x
1
|x
n
|
1 = |x|, hence x
n
0. By denition the following chain
of inequalities holds:
|x +x
n
| |x| x
n
, x
|x
n
|

x +x
n
, x
n
1 x
n
, x
|x
n
|
=
x, x
n
1 +x
n
, x
n
x
|x
n
|
=
x, x
n
1 +z
n
, x
n
x
1x, x
n
)
1x, x
n
)
= +z
n
, x
n
x
,
a contradiction to x
being the Frchet derivative of the norm at x.

In order to prove the converse we remark that the norm at x has a Gteaux deriva-
tive, since otherwise there are two distinct subgradients x
1
, x
2
|x|. But then
the sequence x
1
, x
2
, x
1
, x
2
, . . . violates the assumption since for i = 1, 2 we obtain:
h, x
i
= x + h x, x
i
|x + h| |x| |h| and hence |x
i
| 1. Therefore
x, x
i
|x| = 1 and 0 x, x
i
|0| |x| = 1, hence x, x
i
= 1. Let then
x
be the Gteaux derivative of the norm at x. If the norm is not Frchet differentiable
at x, there exists an > 0 and a sequence (x
n
) X with x
n
0, such that
|x +x
n
| |x| x
n
, x
|x
n
|
,
i.e. |x +x
n
| x +x
n
, x
|x
n
| and hence
x
n
, x
|x
n
| +x, x
|x +x
n
|. (8.8)
Choose a sequence (x
n
) S(X
) with x + x
n
, x
n
= |x + x
n
|, then due to
x
n
0
x, x
n
= |x +x
n
| x
n
, x
n
|x| = 1.
Therefore by assumption there is x
S(X
) with x
n
x
and we have
x, x
n
x, x
= |x|. (8.9)
Since x+hx, x
= x+h, x
x, x
|x+h| |x|, we have x
|x|
hence x
= x
.
On the other hand we obtain by (8.8)
|x
n
x
|
_
x
n
|x
n
|
, x
n
x
_
=
x
n
, x
n
x
n
, x
|x
n
|
x
n
, x
n
+|x
n
| +x, x
|x +x
n
|
|x
n
|
=
x, x
x, x
|x
n
|
+ ,
where the last inequality follows from x, x
= |x| x, x
n
. Thus the sequence
(x
n
) does not converge to x
, a contradiction.
An immediate consequence of the above lemma of Shmulian is the following corol-
lary, which is a special case of the theorem of Phelps 3.8.6:
Corollary 8.4.21. Let the norm of a normed space X be Frchet differentiable for all
x ,= 0. Then the mapping
D : X 0 S(X
)
x D(x),
where D(x) denotes the Frchet gradient at x, is continuous.
Proof. Let x
n
x then
[x, D(x
n
) D(x)[ = [x
n
, D(x
n
) +x x
n
, D(x
n
) x, D(x)[
[|x| |x
n
|[ +[x x
n
, D(x
n
)[ 2|x x
n
|,
hence x, D(x
n
) x, D(x) = 1, where the last equality is due to Theorem 8.1.4
for x S(X). The lemma of Shmulian then implies D(x
n
) D(x) in X
.
By the lemma of Shmulian we are now in the position, to describe the E-space
property by the Frchet differentiability of the dual space:
Theorem 8.4.22 (Anderson). A Banach space X is an E-space, if and only if X
is
Frchet differentiable.
Proof. Let X be an E-space. By Theorem 8.4.18 X is reexive and, since X = X
is strictly convex, X
is smooth. Let x
S(X
) be arbitrary. Now we have to show

the validity of the Shmulian criterion at x
:
Let (x
n
) U(X) with x
n
, x
1. We have to show: the sequence (x

n
) is
convergent.
We have: 1 |x
n
| x
n
, x
1, hence |x
n
| 1. Let x
n
:=
x
n
|x
n
|
and let x
be a weak point of accumulation of the sequence ( x
n
) i.e. x
n
k
x for a subsequence
( x
n
k
) of ( x
n
), then | x| 1, since the closed unit ball of X is weakly closed.
On the other hand by the theorem of HahnBanach there is a y
S(X
) with
x, y
= | x|.
Moreover, we have
x, x
= lim x
n
k
, x
= lim
_
x
n
k
|x
n
k
|
, x
_
= 1,
hence | x| = 1. Theorem 8.4.18 then implies x
n
k
x.
Since X
is smooth, x is by x S(X) and x, x
= 1 uniquely determined as the

Gteaux gradient of the norm at x
, since
x, y
= x, y
x, x
|y
| |x
|.
Therefore we obtain for the whole sequence x
n
x and since |x
n
| 1, as seen
above, also x
n
x, showing that the Shmulian criterion is satised and hence X
Frchet differentiable.
Conversely, let X
be Frchet differentiable. Let x
S(X
) be arbitrary. By
denition there is a sequence (x
n
) U(X) with x
n
, x
1. From the lemma of

Shmulian we conclude: (x
n
) is convergent: x
n
x, hence x
n
, x
x, x
= 1
and | x| 1. Therefore x
attains its supremum at x U(X). According to the

theorem of James X is reexive, since x
S(X
) was arbitrarily chosen.

Let now x
n
, x S(X) with x
n
x, it remains to be shown: x
n
x. Let
x
S(X
) be chosen, such that x, x
= 1, then x
n
, x
x, x
= 1. Again by
the lemma of Shmulian the sequence (x
n
) is convergent: x
n
x, hence x
n
, x

x, x
= 1 and | x| = 1. The smoothness of X
implies x = x, as seen above.

The link to strong solvability (see [42]) is provided by the following
Theorem 8.4.23. A Banach space X is an E-space, if and only if for every closed
convex set K the problem min(| |, K) is strongly solvable.
Proof. Let X be an E-space and K a closed convex subset, then K is also weakly
closed and hence approximatively compact. Let x X K and (x
n
) K a mini-
mizing sequence, i.e. |x x
n
| d(x, K), then there is a subsequence (x
n
k
) with
x
n
k
x
0
K. Hence |x x
0
| = d(x, K). Since X is strictly convex, x
0
is the
uniquely determined best approximation of x w.r.t. K. As every subsequence of (x
n
)
is a minimizing sequence, it again contains a subsequence converging to x
0
. Thus we
have for the whole sequence x
n
x
0
. Therefore the problem min(|x |, K) is
strongly solvable.
Conversely, let every problem of this kind be strongly solvable. Suppose X is not
strictly convex, then there are x
1
, x
2
S(X) with x
1
,= x
2
and [x
1
, x
2
] S(X).
Then the problem min(0, [x
1
, x
2
]) is apparently not strongly solvable, a contradiction.
Now we show the reexivity: let x
S(X
) arbitrary and let H := y

X[ y, x
= 0. H is convex and closed. Let x X H and let (x

n
) H with
|x
n
x| d(x, H). By our assumption x
n
x
0
H. Therefore |x
n
x|
|x
0
x| = d(x, H). Hence x
0
is due to the strict convexity of X the best ap-
proximation of x w.r.t. H. According to the theorem of Singer 3.12.6 we then obtain
x x
0
, x
= |x x
0
|.
Section 8.5 Frchet differentiability and Local Uniform Convexity in Orlicz Spaces 267
This means that x
attains its supremum on the unit ball of X. Since x
was arbitrarily
chosen, X is, by the theorem of James, reexive.
That x
n
, x S(X) with x
n
x implies that |x
n
x| 0 can be shown in the
same way as in the last part of the proof of Theorem 8.4.18.
8.5 Frchet differentiability and Local Uniform Convexity
in Orlicz Spaces
We obtain as the central result of this paragraph that in a reexive Orlicz space strong
and weak differentiability of Luxemburg and Orlicz norm, as well as strict and local
uniform convexity coincide.
8.5.1 Frchet Differentiability of Modular and Luxemburg Norm
If is differentiable, then f
and | |
()
are Gteaux differentiable and for the
Gteaux derivatives we have (see Lemma 8.2.1)
(f
)
t
(x
0
), x =
_
T
x
t
(x
0
)d
and (see Equation (8.6)) |x
0
|
t
()
, x =
f
(y
0
),x)
f
(y
0
),y
0
)
, where y
0
:=
x
0
|x
0
|
()
for x
0
,= 0. If
the conjugate function of satises the
2
-condition we can prove the continuity of
the above Gteaux derivatives.
The following theorem was shown by Krasnosielskii [72] for the Lebesgue measure
on a compact subset of the R
n
, but in a different way.
Theorem 8.5.1. Let (T, , ) be a -nite measure space, a differentiable Young
function, and let its conjugate function satisfy the
2
-condition. Then the Gteaux
derivatives of f
and | |
()
are continuous mappings from
M
() resp. M
() 0 to L
().
Proof. Let (x
n
)
n=1
be a sequence in M
() with limx
n
= x
0
.
First we show that in L
() the relation lim

n
t
(x
n
) =
t
(x
0
) holds. As
satises the
2
-condition, convergence in the norm is equivalent to convergence w.r.t.
the modular f
(see Theorems 6.2.21, 6.3.1 and 6.3.3), i.e. we have to show that
lim
n
f
(
t
(x
n
)
t
(x
0
)) = 0.
Let now T be represented as a countable union of pairwise disjoint sets T
i
, i =
1, 2, . . . , of nite positive measures.
We dene
S
k
:=
k
_
i=1
T
i
, S
t
k
:= t T [ [x
0
(t)[ k,
D
k
:= S
k
S
t
k
,
and nally
R
k
:= T D
k
.
First we show:
_
R
k
x
0
t
(x
0
)d
k
0. As [x
0
(t)[ > k on R
k
we obtain
>
_
T
x
0
t
(x
0
)d
_
R
k
x
0
t
(x
0
)d (R
k
)k
t
(k).
As k
t
(k) (k)
k
, it follows that (R
k
)
k
0. As D
k
D
k+1
the sequence (x
0
t
(x
0
)
D
k
) converges monotonically increasing pointwise a. e.
to x
0
t
(x
0
). With
_
D
k
x
0
t
(x
0
)d
k
_
T
x
0
t
(x
0
)d it then follows that
_
R
k
x
0
t
(x
0
)d
k
0. For given > 0 we now choose k large enough that
_
R
k
x
0
t
(x
0
)d and (D
k
) > 0. (8.10)
As
t
is uniformly continuous on I := [k 1, k +1], there is a (0, 1), such that
[
t
(s)
t
(r)[
1
_

(D
k
)
_
for [s r[ and s, r I.
According to Theorem 7.4.5 the sequence (x
n
)
n=1
converges to x
0
in measure, i.e.
there is a sequence of sets (Q
n
)
n=1
with lim
n
(Q
n
) = 0, such that
lim
n
sup
tT\Q
n
[x
0
(t) x
n
(t)[ = 0.
In particular there is a natural number N, such that for n N
[x
n
(t) x
0
(t)[ for t T Q
n
,
and
(Q
n
)

k
t
(k)
. (8.11)
Thus we obtain
_
T\(Q
n
R
k
)
(
t
(x
n
)
t
(x
0
))d
_
T\(Q
n
R
k
)
1
_

(D
k
)
__
d (8.12)
=
(T (Q
n
R
k
)
(D
k
)
. (8.13)
According to Theorem 3.8.5 the derivative of a continuous convex function is demi-
continuous, i.e. here: if the sequence (w
n
)
n=1
converges to w
0
in M
(), then
(
t
(w
n
))
n=1
converges -weak to
t
(w
0
). Since according to the Uniform Bounded-
ness theorem of Banach (see Theorem 5.3.14) the sequence (
t
(w
n
))
n=1
is bounded
in L
(), we obtain because of

_
T
w
n
t
(w
n
)d
_
T
w
0
t
(w
0
)d
=
_
T
(w
n
w
0
)
t
(w
n
)d +
_
T
w
0
(
t
(w
n
)
t
(w
0
))d
using Hlders inequality
_
T
w
n
t
(w
n
)d
_
T
w
0
t
(w
0
)d
|w
n
w
0
|
()
|
t
(w
n
)|
_
T
w
0
(
t
(w
n
)
t
(w
0
))d
,
where the last expression on the right-hand side converges to zero because of the
*-weak convergence of
t
(w
n
) to
t
(w
0
). In this way we obtain the relation
lim
n
_
T
w
n
t
(w
n
)d =
_
T
w
0
t
(w
0
)d. (8.14)
Let now: w
n
:=
Q
n
R
k
x
n
, let further be w
0
:=
R
k
x
0
, and let v
n
:=
Q
n
R
k
x
0
,
then we obtain
w
n
w
0
= w
n
v
n
+v
n
w
0
= (x
n
x
0
)
Q
n
R
k
+x
0
(
Q
n
R
k

R
k
).
Now we have: [x
n
x
0
[ [x
n
x
0
[
Q
n
R
k
and, using the monotonicity of the
Luxemburg norm, we obtain [x
n
x
0
[
Q
n
R
k

n
0, moreover using (8.11)
|v
n
w
0
|
()
= |
Q
n
\R
k
[x
0
[|
()
|
Q
n
k|
()

t
(k)

(k)
k
,
since is in particular nite (and hence
(k)
k
). We conclude w
n

n
w
0
.
Taking
_
Q
n
R
k
x
n
t
(x
n
)d
_
R
k
x
0
t
(x
0
)d +
_
R
k
x
0
t
(x
0
)d
_
Q
n
R
k
x
0
t
(x
0
)d
=
__
T
w
n
t
(w
n
)d
_
T
w
0
t
(w
0
)d
_
+
__
T
w
0
t
(w
0
)d
_
T
v
n
t
(v
n
)d
_
into account it follows using (8.14)
lim
n
__
Q
n
R
k
x
n
t
(x
n
)d
_
Q
n
R
k
x
0
t
(x
0
)d
_
= 0.
The
2
-condition for together with Youngs equality yields a. e.
(
t
(x
n
)
t
(x
0
))
1
2
((2
t
(x
n
)) +(2
t
(x
0
))

2
((
t
(x
n
)) +(
t
(x
0
))

2
(x
n
t
(x
n
) +x
0
t
(x
0
)).
Therefore
_
Q
n
R
k
(
t
(x
n
)
t
(x
0
))d

2
__
Q
n
R
k
x
n
t
(x
n
)d +
_
Q
n
R
k
x
0
t
(x
0
)d
_

__
Q
n
R
k
x
0
t
(x
0
)d +
_
for n sufciently large and, using (8.10) and (8.11), we obtain
_
Q
n
R
k
x
0
t
(x
0
)d =
_
R
k
x
0
t
(x
0
)d +
_
Q
n
\R
k
x
0
t
(x
0
)d 2.
Together with (8.12) the rst part of the claim follows.
Let now x
0
M
() 0 and (x
n
)
n=1
be a sequence that converges in M
()
to x
0
.
Let y
n
:=
x
n
|x
n
|
()
for n N 0, then also (y
n
)
n=1
converges to y
0
, i.e.
lim
n
t
(y
n
) =
t
(y
0
) and, because of (8.14) also
lim
n
_
T
y
n
t
(y
n
)d =
_
T
y
0
t
(y
0
)d
holds, which completes the proof for the derivatives of the Luxemburg norm.
Remark 8.5.2. To prove the above Theorem 8.5.1 for the sequence space l
, only the
0
2
-condition for is required. The proof can be performed in a similar way.
Remark 8.5.3. If T in Theorem 8.5.1 has nite measure, only the
2
-condition for
is needed.
Proof. is differentiable, hence strict convex according to Theorem 6.1.22, thus
in particular denite.
Let now (2s) (s) for all s s
0
0, let, different from Theorem 8.5.1,
D
k
:= S
t
k
and k large enough that
(R
k
)

2((2s
0
) + 1)
,
furthermore
_
R
k
x
0
t
(x
0
)d and
(2s
0
)
k
(k)
1 are satised.
If we set
P
n
:= t Q
n
R
k
[ [
t
(x
n
(t))[ s
0
P
t
n
:= t Q
n
R
k
[ [
t
(x
0
(t))[ s
0
,
then we obtain in similar fashion as in Theorem 8.5.1
_
Q
n
R
k
(
t
(x
n
)
t
(x
0
))d
1
2
__
Q
n
R
k
(2
t
(x
n
))d +
_
Q
n
R
k
(2
t
(x
0
))d
_
=
1
2
__
P
n
(2
t
(x
n
))d +
_
(Q
n
R
k
)\P
n
(2
t
(x
n
))d
+
_
P
n
(2
t
(x
0
))d +
_
(Q
n
R
k
)\P
n
(2
t
(x
0
))d
_
1
2
_
(2s
0
)((P
n
) +(P
t
n
))
+
__
(Q
n
R
k
)\P
n
x
n
t
(x
n
)d +
_
(Q
n
R
k
)\P
n
x
0
t
(x
0
)d
__
1
2
((R
k
) +(Q
n
))(2s
0
) +

2
_
2
_
Q
n
R
k
x
0
t
(x
0
)d +
_
(1 + 3)
for n sufciently large.
We will now demonstrate that for a nite, not purely atomic measure space the
following statement holds: if (L
(), | |
()
) is Frchet differentiable, then (L
(),
| |
()
) is already reexive. For this purpose we need the following
Lemma 8.5.4. Let (T, , ) be a -nite, not purely atomic measure space. If
(L
(), | |
()
) is Frchet differentiable, then is nite.
Proof. For let not be nite, then
t
is bounded on R, i.e. there are positive numbers
a, b, c such that a(s c) (s) b s for s c.
If T has nite measure, Theorem 6.2.10 implies
(a) L
1
() = L
(), and the norms | |

1
and | |
()
are equivalent.
(b) L
() = L
(), and the norms | |
and | |
are equivalent (here we have

also used the equivalence of Orlicz and Luxemburg norms).
Clearly satises the
2
-condition, hence M
()=L
() (see Theorem 6.2.42)

Therefore we obtain
(L
(), | |
) = (M
(), | |
()
)
.
Let now A be the set of atoms in T and let := (T A), then we choose disjoint
sets G
k
in T A with (G
k
) = 2
k1
for k = 1, 2, . . . (see Theorem 6.2.30).
Furthermore let s
0
R
+
be large enough that
t
_
s
0
e
2
1
(
1
2
+(A)
)
_
> 0.
If we dene the functions x
n
on T by
x
n
(t) :=
_
(1
1
k
)
n
s
0
for t G
k
, k = 1, 2, . . .
1 otherwise
for n = 1, 2, . . . and x
0
with
x
0
(t) :=
_
0 for t G
k
, k = 1, 2, . . .
1 otherwise,
then the sequence (x
n
) converges to x
0
in the L
1
-norm. To see this we observe
_
T
[x
0
x
n
[d =
k=1
_
G
k
[x
0
x
n
[d = s
0
k=1
2
k1
_
1
1
k
_
n
.
Let now > 0 be given, then we choose a natural number r such that
k=r+1
2
k1
<

2
. Then
_
T
[x
0
x
n
[d s
0
__
1
1
r
_
n
r
k=1
2
k1
+

2
_
s
0
for sufciently large n. Hence also |x

n
x
0
|
()
0, due to the equivalence of the
norms.
Thus the number-sequence (|x
n
|
()
) converges because of
k
(G
k
) =

2
to
|x
0
|
()
=
1
1
((
2
+(A))
1
)
(see Lemma 7.4.3). We now set y
n
:=
x
n
|x
n
|
()
for n = 0, 1, 2, . . . . The Gteaux
gradient of | |
()
at x
n
is
y
n
_
T
y
n
t
(y
n
)d
.
We rst consider the sequence (
t
(y
n
))
n=1
in L
().
For n sufciently large we obtain because of the monotonicity of
t
and the choice
of s
0
|
t
(y
n
)
t
(y
0
)|
ess sup
tG
n
[
t
(y
n
(t))
t
(y
0
(t))[
=
t
_
(1
1
n
)
n
s
0
|x
n
|
()
_

t
_
e
2
s
0
|x
0
|
()
_
> 0.
But the number-sequence
__
T
y
n
t
(y
n
)d
_
n=1
converges to
_
T
y
0
t
(y
0
)d,
because apparently
t
(y
n
) [ n N is uniformly bounded in L
and we have
_
T
y
0
((y
n
)
t
(y
0
))d =
_
T\
k=1
G
k
y
0
((y
n
)
t
(y
0
))d
=
1
|x
0
|
()
_
t
_
1
|x
n
|
()
_
t
_
1
|x
0
|
()
__
2
+(A)
_
.
Thus the Gteaux derivative of | |
()
is not continuous in y
0
and therefore | |
()
according to Theorem 3.8.6 not Frchet differentiable.
If T has innite measure, the we choose a not purely atomic subset T
0
of T with
nite measure.
If
0
is the subalgebra of , that consists in the elements of contained in T
0
,
then we denote the restriction of to
0
by
0
. In the same way as above we can
construct a sequence (y
n
) of elements of the unit sphere of L
(
0
) converging to y
0
in L
(
0
), for which the sequence (
t
(y
n
)) does not converge to
t
(y
0
) in L
(
0
).
If one considers L
(
0
) and L
(
0
) as subspaces of L
() and L
() resp., then it
is easy to see that the restriction of the Luxemburg norm of L
() to L
(
0
) and of
the Orlicz norm of L
() to L
(
0
) is equal to the Luxemburg norm on L
(
0
) and
equal to the Orlicz norm on L
(
0
) resp. Thus for not nite and (T) = the
Luxemburg norm | |
()
is also not Frchet differentiable on L
() 0.
The Frchet differentiability of (L
, | |
()
) implies already the
2
-condition
for :
Theorem 8.5.5. Let (T, , ) be a -nite, not purely atomic measure space. If
(L
(), | |
()
) is Frchet differentiable, then satises the
2
-condition.
Proof. Assume that does not satisfy the
2
-condition, then L
contains according
to Theorem 6.2.39 a subspace, which is isometrically isomorphic to
. But
is not
Frchet differentiable, because otherwise
1
(according to the theorem of Anderson
(see Theorem 8.4.22)) is an E-space, hence in particular reexive, a contradiction.
Therefore we obtain the following
Theorem 8.5.6. Let (T, , ) be a nite, not purely atomic measure space. then
the following statement holds: if (L
(), | |
()
) is Frchet differentiable, then
(L
(), | |
()
) is reexive.
Proof. According to Theorem 8.5.5 satises the
2
-condition. Then, due to The-
orem 6.2.42 we have L
() = M
(). Therefore, if | |
()
is Frchet differentiable
on L
() 0, then is nite according to Lemma 8.5.4, hence by Theorem 7.6.2

(M
(), | |
= (L
(), | |
()
) and thus, according to the theorem of Ander-
son (see 8.4.22), (M
(), | |
) is an E-space, hence in particular reexive. Using

Hlders inequality we obtain
(L
(), | |
) (L
(), | |
()
)
= (M
(), | |
)
and so we conclude L
() = M
().
We are now in the position to characterize the Frchet differentiable Luxemburg
norms. In order to simplify the wording of our results we introduce the following
notation:
Denition 8.5.7. We call the measure space (T, , ) essentially not purely atomic,
if (T A) > 0, and for (T) = even (T A) = hold, where A is the set of
atoms.
Theorem 8.5.8. Let (T, , ) be a -nite, essentially not purely atomic measure
space, and let (L
(), | |
()
) be reexive. Then the following statements are equiv-
alent:
a) (L
(), | |
()
) is at convex,
b) f
is continuously Frchet differentiable on L
(),
c) | |
()
() 0,
d) is differentiable.
Proof. a) b) and c): According to Theorem 8.2.3 at convexity for a not purely
atomic measure space implies the differentiability of and by the theorem of Mazur
8.1.3 the Gteaux differentiability of | |
()
.
Using Theorem 7.7.1, reexivity of (L
(), | |
()
) implies the
2
or
2
-condi-
tion for and , if T has innite or nite measure respectively, because otherwise L
has a (closed) subspace, isomorphic to
, which according to [41] is also reexive,

a contradiction. We conclude, using Theorem 6.2.42 M
() = L
().
Due to the equivalence of Luxemburg and Orlicz norm, (L
(), | |
) is also
reexive. Moreover, has to be nite and hence, according to Theorem 7.6.2
(L
(), | |
= (L
(), | |
()
).
Being the dual space of (L
(), | |
), the space (L
(), | |
()
) is also reexive.
By the
2
or
2
-condition (depending on innite or nite measure of T respectively)
for and Theorems 8.5.1 and 8.5.3 it follows that the Gteaux derivative of f
is
continuous on M
() and hence, as in Theorem 8.5.1, the corresponding property

for | |
()
. By Theorem 3.8.6 continuous Gteaux differentiability and continuous
Frchet differentiability are equivalent.
a) d): According to Theorem 8.2.3 at convexity for a not purely atomic mea-
sure is equivalent to differentiability of .
b) a): Due to Theorem 8.1.1 the level set S
f
(1) is at convex. On the other
hand S
f
(1) is identical to the unit sphere of M
= L
. Hence (L
, | |
()
) is at
convex.
c) a): If | |
()
is Frchet differentiable on L
() 0, then (L
(), | |
()
)
is in particular at convex due to the theorem of Mazur 8.1.3.
For nite measure a somewhat stronger version is available.
Theorem 8.5.9. Let (T, , ) be a nite, not purely atomic measure space. Then the
a) (L
(), | |
()
) is at convex and reexive,
b) | |
()
() 0,
c) is differentiable and (L
(), | |
()
) is reexive.
Proof. Theorems 8.5.8 and 8.5.6.
For the sequence space l
the above theorem can be proved in somewhat weaker

form.
Theorem 8.5.10. Let the Young function be nite and let l
be reexive. then the

a) (l
, | |
()
) is at convex,
b) | |
()
is Frchet differentiable on l
0,
c) is differentiable for all s with [s[ <
1
(1).
Proof. The equivalence of a) and c) follows with Theorems 8.2.4 and 7.7.2. Let now
(l
, | |
()
) be at convex. The left-sided derivative
t
of in
1
(1) is nite. If
we continue
t
in a continuous, linear, and strictly increasing way beyond

1
(1),
and denote that primitive function which is zero at the origin by

, then

is a dif-
ferentiable Young function. Apparently l
= l
and | |
(

)
= | |
()
. If

is
the convex conjugate of

then by construction

is nite. and hence

satises
the
0
2
-condition, because otherwise (l
, | |
()
) contains, according to the theorem
of LindenstraussTsafriri a subspace isomorphic to
, a contradiction to the reex-

ivity of (l
, | |
()
). It follows that m
. Due to Theorem 7.6.2 we have

(l
, | |
= (l
, | |
(

)
). The equivalence of the norms implies that (l
, | |
) is
also reexive and hence (l
, | |
(

)
), being the dual space of a reexive space, is also
reexive. As demonstrated above the
0
2
-condition for

follows. Using Remark 8.5.2
we obtain b). By Theorem 8.1.2 a) follows from b).
8.5.2 Frchet Differentiability and Local Uniform Convexity
In the sequel we need a characterization of strict convexity of the Luxemburg norm
on H
().
Theorem 8.5.11. Let (T, , ) be a not purely atomic measure space and let a
nite Young function.
is strictly convex, if and only if (H
(), | |
()
) is strictly convex.
Proof. Let A be the set of atoms of T. If is not strictly convex, then there are
different positive numbers s and t, such that
_
s +t
2
_
=
1
2
((s) +(t)).
We choose pairwise disjoint sets T
1
, T
2
, T
3
of nite measure with 0 < (T
1
) =
(T
2
) min(
(T\A)
3
,
1
2((s)+(t))
) and 0 < (T
3
) (T A) 2(T
1
).
Furthermore, let u :=
1
(
1(T
1
)((s)+(t))
(T
3
)
).
The functions
x := s
T
1
+t
T
2
+u
T
3
,
and
y := t
T
1
+s
T
2
+u
T
3
are then elements of the unit sphere, because we have
_
T
(x)d = (T
1
)(s) +(T
2
)(t) +(T
3
)(u)
= (T
2
)(s) +(T
1
)(t) +(T
3
)(u) =
_
T
(y)d = 1.
Now, the properties of s, t, T
1
and T
2
imply
_
T
_
x +y
2
_
d = (T
1
)
_
s +t
2
_
+(T
2
)
_
s +t
2
_
+(T
3
)(u) = 1.
Conversely, let be strictly convex, and let x
1
, x
2
be elements of the unit sphere
with |
x
1
+x
2
2
|
()
= 1, then Lemma 6.2.15 implies
1 =
_
T
(x
1
)d =
_
T
(x
2
)d =
_
T
_
x
1
+x
2
2
_
d,
hence, because of the convexity of , we obtain (
x
1
+x
2
2
) (x
1
) (x
2
) = 0
almost everywhere and due to the strict convexity x
1
= x
2
almost everywhere, i.e. the
Luxemburg norm is strictly convex on H
().
Corollary 8.5.12. Let (T, , ) be a not purely atomic measure space and let be a
nite Young function.
If f
strictly convex, then is strictly convex, because then also H
is strictly
convex.
For our further reasoning we need the subsequent theorem (see e.g. [49], p. 350),
which describes the duality between strict and at convexity:
Theorem 8.5.13. If the dual space X
of a normed space X is strict or at convex,

then X itself is at or strict convex respectively.
We are now in the position, to characterize the reexive and locally uniformly con-
vex Orlicz spaces w.r.t. the Orlicz norm.
Theorem 8.5.14. Let (T,,) a -nite, essentially not purely atomic measure space
and let L
() be reexive. Then the following statements are equivalent:

a) is strictly convex,
b) (L
(), | |
) is strictly convex,
c) | |
2
is locally uniformly convex,

d) f
is locally uniformly convex.

Proof. The reexivity implies in particular that and are nite.
a) b): If is strictly convex, then its conjugate is differentiable (see The-
orem 6.1.22). Then due to Theorem 8.2.3 (L
(), | |
()
) is at convex, hence
(L
(), | |
) strictly convex (see Theorem 8.5.13).

b) c): If (L
(), | |
) is strictly convex, then (L
(), | |
()
) is at convex
(see Theorem 8.5.13) and hence due to Theorem 8.5.8 | |
2
()
/2 Frchet differen-
tiable. As | |
2
/2 is the conjugate function of | |

2
()
/2 (see Example 3.11.10), then,
according to Theorem 8.4.10 and Theorem 8.4.14 | |
2

Apparently b) follows immediately from c).
b) d): From at convexity of (L
(), | |
()
) and reexivity it follows with
Theorem 8.5.8 that f
is Frchet differentiable. and satisfy the

2
- or the
2
-
condition respectively, depending on (T) being innite or nite (see Theorem 7.7.1).
Thus f
and its conjugate f
are bounded (see Theorem 7.3.4 and Theorem 6.3.10).

Due to Theorem 8.4.10 and Theorem 8.4.14 this implies the local uniform convexity
of f
.
d) a): This follows immediately from Corollary 8.5.12.
Remark 8.5.15. In Example 8.6.12 we present a reexive and strictly convex Orlicz
space w.r.t. the Orlicz norm, which is not uniformly convex (compare Milnes in [85],
p. 1482).
the above theorem can be proved in a somewhat weaker

version.
Theorem 8.5.16. Let and be nite and let l
be reexive, then the following

statements are equivalent:
a) (l
, | |
b) | |
2

Proof. As in Theorem 8.5.14 by use of Theorem 8.5.10.
8.5.3 Frchet Differentiability of the Orlicz Norm and Local Uniform
Convexity of the Luxemburg Norm
Using the relationships between Frchet differentiability, local uniform convexity and
strong solvability presented in the previous paragraphs, we are now in the position to
describe the Frchet differentiability of the Orlicz norm.
Theorem 8.5.17. Let (T, , ) be an essentially not purely atomic, -nite measure
space, and let L

a) (L
(), | |
) is at convex,
b) is differentiable,
c) | |
() 0.
Proof. a) b): Let be the conjugate of .
If (L
(), | |
) is at convex, then (L
(), | |
()
) is strictly convex. Due to
Theorem 8.5.11 is strictly convex and hence differentiable according to Theo-
rem 6.1.22.
b) c): From the differentiability of it follows by Theorem 8.5.8 that f
is
Frchet differentiable. As in the proof of Theorem 8.5.14 the local uniform convex-
ity of f
follows. Strong and weak sequential convergence agree on the set S :=

x[ f
(x) = 1 because from x

n
x for x
n
, x S it follows for x
(x)
0 = f
(x
n
) f
(x) x
n
x, x
+(|x
n
x|
()
),
where is the convexity module of f
belonging to x and x
, and thus x
n
x.
As S is the unit sphere of L
() w.r.t. the Luxemburg norm, (L
(), | |
()
) is an
E-space according to Theorem 8.4.18, hence | |
()
has a strong minimum on every
closed convex set due to Theorem 8.4.23, Apparently, this also holds for | |
2
()
/2.
Theorem8.4.10 and Theorem8.4.14 then imply the Frchet differentiability of ||
2
/2
and hence of | |
in L
() 0.
c) a): This follows from the theorem of Mazur.
It is now a simple task to characterize the locally uniformly convex, reexive Orlicz
spaces w.r.t. the Luxemburg norm.
Theorem 8.5.18. Let (T, , ) be -nite, essentially not purely atomic measure
space and let L

a) is strictly convex,
b) (L
(), | |
()
c) | |
2
()
Proof. Because of H
() = L
() the equivalence of a) and b) follows from Theo-

rem 8.5.11. If (L
(), | |
()
) is strictly convex, then (L
(), | |
) is at convex
and therefore due to Theorem 8.5.17 | |
is Frchet differentiable. Using Theo-

rems 8.4.10 and 8.4.14 c) follows.
c) b) is obvious.
The theorems corresponding to Theorems 8.5.17 and 8.5.18 for l
can be stated in
the subsequent weaker form.
Theorem 8.5.19. Let l
be reexive, differentiable and nite, then | |
is
Frchet differentiable on l
0.
Proof. Because of Theorem 7.7.2 satises the
0
2
-condition. Hence f
is, due to
Remark 8.5.2, Frchet differentiable. The remaining reasoning follows the lines of
Theorem 8.5.17 for b) c).
Remark 8.5.20. If the conditions of Theorem 8.5.19 are satised then strong and
weak differentiability of the Orlicz norm on l
agree.
Theorem 8.5.21. Let l
be reexive, let be strictly convex and let be nite, then

| |
2
()
Proof. is due to Theorem 6.1.22 differentiable and hence | |
according to Theo-
rem 8.5.19 Frchet differentiable. Using Theorems 8.4.10 and 8.4.14 the claim of the
theorem follows.
8.5.4 Summary
Based on the theorems proved above we are now able to describe Frchet differentia-
bility and local uniform convexity by a list of equivalent statements.
Theorem 8.5.22. Let (T, , ) be a -nite essentially not purely atomic measure
space, let be a Young function and its conjugate, and let L
() be reexive.
a) is differentiable,
b) (L
(), | |
) is at convex,
c) (L
(), | |
()
) is at convex,
d) | |
() 0,
e) | |
()
() 0,
f) f
(),
g) is strictly convex,
h) L
(), | |
i) L
(), | |
()
j) | |
2

k) | |
2
()
l) f

m) | |
has a strong minimum on K,

n) | |
()
Section 8.6 Uniform Convexity and Uniform Differentiability 281
o) f

p) L
(), | |
) is an E-space,
q) L
(), | |
()
) is an E-space,
for every closed convex subset K of L
().
Proof. a), c), e), f) are equivalent according to Theorem 8.5.8; a), b), d) according
to Theorem 8.5.17; g), h), j), l) according to Theorem 8.5.14; g), i), k) according to
Theorem 8.5.18. The equivalence of a) and g) is well known (6.1.22), the equivalence
of j) and m), k) and n) as well as of l) and o) follow from Theorem 8.4.14. Equivalence
of m) and q) ad of n) and p) follows from Theorem 8.4.23.
we obtain the following

Theorem 8.5.23. Let be differentiable, nite and let l
be reexive. Then the

following statements hold:
a) | |
is continuously Frchet differentiable on l
0,
b) | |
()
is continuously Frchet differentiable on l
0,
c) f
is continuously Frchet differentiable,

d) | |
2

e) | |
2
()
f) f

g) | |

h) | |
()
i) f
has a strong minimum on K

for every closed convex subset K of l
.
Proof. b) follows from Theorem 8.5.10, d) with Theorem 8.5.21, a) with Theorem
8.5.19. From reexivity we obtain using Remark 8.5.2 statement c) and using Theo-
rem 8.4.10 and Theorem 8.4.14 thereby f). Finally e) follows from a). Statements g),
h) and i) follow using Theorems 8.4.10 and 8.4.14.
8.6 Uniform Convexity and Uniform Differentiability
Whereas in the previous section we have studied the duality between Frchet differen-
tiability and local uniform convexity (and the related strong solvability) in particular
in Orlicz spaces, we will now turn our attention to the duality between uniform con-
vexity and uniform differentiability.
Denition 8.6.1. Let X be a Banach space, then we dene the module of smoothness
by
X
() := sup
|x|=|y|=1
_
1
2
(|x +y| +|x y|) 1
_
X is called uniformly differentiable, if
lim
0
X
()
= 0.
Kthe (see [49], p. 366 f.) shows that uniform differentiability implies in particular
Frchet differentiability of the norm.
In the context of our treatment of greedy algorithms we need the following
Lemma 8.6.2. Let X be a Banach space of dimension at least 2 and let
X
be the
corresponding module of smoothness, then
X
(u) > 0 for u > 0.
X
is convex
(and hence continuous) on R
+
. If X is beyond that uniformly differentiable, then
u

X
(u)
u
is strictly monotonically increasing.
Proof. In order to simplify notation put :=
X
. The rst part follows from ()
(1 + )
1/2
1 (see Lindenstrauss [77]). is convex, being the supremum of convex
functions. Since (0) = 0 the function u
(u)
u
being the difference quotient at 0 is
monotonically increasing. Suppose there are 0 < t 0 with
()
= c there, hence () = c on [u, t].

Therefore
t
+
(u) = c and due to the subgradient inequality
c(v u) =
t
+
(u)(v u) (v) c u
and hence c
(v)
v
for all v > 0. But then c lim
v0
(v)
v
= 0, a contradiction.
Denition 8.6.3. Let X be a Banach space, then we dene the convexity module by
X
() := inf
|x|=|y|=1,|xy|=
(2 |x +y|)
X is then called uniformly convex, if
X
() > 0 for all > 0.
The following theorem can be found in Lindenstrauss (compare [77]):
Theorem 8.6.4 (Lindenstrauss). X is uniformly convex, if X
is uniformly differen-
tiable. Moreover for arbitrary Banach spaces
X
() = sup
02
(/2
X
())
X
() = sup
02
(/2
X
())
for > 0.
Remark 8.6.5. That uniform differentiability of X
follows from uniform convexity

of X, can be understood by using the above duality relation between the modules of
smoothness and convexity as follows: at rst the parallelogram equality in a Hilbert
space for |x| = |y| = 1 and |x y| = implies
1
2
(2 |x +y|) =
1
2
(2
_
4 |x y|
2
) = 1
_
1

2
4
=
H
().
Apparently then
H
() =
2
+ o(
2
). According to Day (see [24]) Hilbert spaces
are those Banach spaces with largest convexity module: for an arbitrary Banach space
X:
X
()
H
() for all > 0. Then in particular: lim
0
X
()
= 0.
Let now (
n
) be a sequence of positive numbers tending to zero, then the equality
n
2
=
()
has a solution
n
for each n N large enough. Then (
n
), due to
X
() > 0 for
all > 0, also tends to zero. Even though
X
is in general not convex, the mapping

X
()
is monotonically increasing (see [30], Proposition 3). Hence, because of
n
2

X
()
0 for
n
X
(
n
)
n
= sup
0
n
_

n
_
n
2

X
()
__

n
2
.
Conversely uniform convexity of X can be obtained from uniform differentiability
of X
in the following way: let
X
() := sup
02
(/2
X
())
= sup
02
_
_
/2

X
()
__
.
If > 0, then for small enough the expression /2

X
()
is positive and hence
X
() > 0. On the other hand apparently
X
()

2

X
() holds for all > 0
and 0 2 and hence
X
()

X
().
8.6.1 Uniform Convexity of the Orlicz Norm
For non-atomic measures Milne (see [85]) gives the following characterization of uni-
form convexity of the Orlicz norm:
Theorem 8.6.6. (L
(), | |
) is uniformly convex, if and only if is differentiable

and for each 0 < < 1/4 there is a constant R
with 1 < R
N < , such that

(a) for innite measure
i. (2u) N(u)
ii.
t
+
((1 )u) <
1
R
t
+
(u)
for all u > 0
(b) for nite measure
i. limsup
u
(2u)/(u) N
ii. limsup
u
+
(u)
+
((1)u)
> R
holds.
A remark of Trojanski and Maleev (see [83]) indicates that the
2
-condition for
can be expressed in terms of , in fact:
Lemma 8.6.7. Let and be nite and < 1, then
(t)

2
(t) (2s)
2
(s)
holds.
Proof. By the Youngs equality
(2s) +(
t
+
(2s)) = 2s
t
+
(2s)
holds and hence
2
(2s) = s
t
+
(2s)

2
(
t
+
(2s)) s
t
+
(2s) (
t
+
(2s)).
Youngs inequality (s) ts (t) then yields for t =
t
+
(s)
2
(2s) s
t
+
(2s) +(s)
t
+
(s)s = (s).
Interchanging the roles of and , one obtains the converse:
2
(t) =
2
(t
t
+
(t)) (
t
+
(t)) 2t
t
+
(t) (2
t
+
(t)) (t),
where the last inequality follows from Youngs inequality (t) st (s) for
s = 2
t
+
(t).
Remark 8.6.8. In a similar manner one can show for 0 < < 1 < R <
(t)

R
(t) (Rs)
R
(s).
By Remark 6.2.27 this is equivalent to the
2
-condition for .
Hence from condition a(ii) resp. b(ii) in the theorem of Milne 8.6.6 the
2
-condition
for follows:
Remark 8.6.9. From
t
+
((1 )u) <
1
R
t
+
(u) we obtain via integration
1
1
((1 )t) =
_
t
0
t
+
((1 )u)du <
1
R
_
t
0
t
+
(u)du =
1
R
(t).
From the theorems of Lindenstrauss and Milne we thus obtain a description of the
uniform differentiability of the Luxemburg norm in the non-atomic case.
Theorem 8.6.10. (L
(), | |
()
) is uniformly differentiable, if and only if is
differentiable and for each 0 < < 1/4 there is a constant 1 < R
N < , such
that
i. (2u) N(u)
ii.
t
+
((1 )u) <
1
R
t
+
(u)
for all u > 0
i. limsup
u
(2u)/(u) N
ii. limsup
u
+
(u)
+
((1)u)
> R
.
Remark 8.6.11. Let be a differentiable Young function with :=
t
and (2s)
(s) for all s 0, then satises the
2
-condition with (2s) 2(s) for all
s 0.
Proof. We obtain by change of variable = 2t
(2s) =
_
2s
0
()d = 2
_
s
0
(2t)dt 2
_
s
0
(t)dt = 2(s).
We now give an example of a reexive, w.r.t. the Orlicz norm strictly convex Orlicz
space, which is not uniformly convex (compare Milnes in [85], p. 1482).
Example 8.6.12. Let u
0
= v
0
= 0 and 0 < <
1
4
. Let further u
n
:= 2
n1
and
u
t
n
:= (1 + )u
n
, furthermore v
n
:= 2
n1
and v
t
n
:= (2
n1
+
1
2
) for n N. Let
further be the linear interpolant of these values, more precisely: (s) = s for
0 s u
1
and for n N
(s) =
_
v
n
+r
t
n
(s u
n
) for u
n
s u
t
n
v
t
n
+r
tt
n
(s u
t
n
) for u
t
n
s u
n+1
,
where r
t
n
=
v
n
v
n
u
n
u
n
=
1
2
n
and r
tt
n
=
v
n+1
v
n
u
n+1
u
n
=
1
1
2
n
1
.
In particular is strictly monotonically increasing and for n N
(u
n+1
) = 2
n
= 2 (u
n
)
holds and hence for u
n
s u
n+1
(u
n+1
) = (2u
n
) (2s) (u
n+2
) 4 (u
n
) 4(s).
The interval (0, u
1
) requires a special treatment: let at rst 0 < s
1
2
, then (2s) =
2s = 2(s). For
1
2
< s < 1 we obtain
(2s) =
_
v
1
+r
t
1
(2s u
1
) for u
1
2s u
t
1
v
t
1
+r
tt
1
(2s u
t
1
) for u
t
1
2s u
2
.
We observe
(2s)
_
(
1
1)s for u
1
2s u
t
1
2
1
s for u
t
1
2s u
2
.
Since
2
1

8
3
we obtain altogether with := max4,
1
1
(2s) (s) for s 0,
and hence by Remark 8.6.11
(2s) 2(s) for s 0.
We now consider the inverse of . By Corollary 6.1.16 is the derivative of the
conjugate of . We obtain: (s) = s for 0 s v
1
and for n N
(s) =
_
u
n
+
t
n
(s v
n
) for v
n
s v
t
n
u
t
n
+
tt
n
(s v
t
n
) for v
t
n
s v
n+1
,
where
t
n
=
u
n
u
n
v
n
v
n
= 2
n
and
tt
n
=
u
n+1
u
n
v
n+1
v
n
=
1
1
1
2
n
. As above we obtain
(v
n+1
) = 2
n
= 2 (v
n
),
and hence for v
n
s v
n+1
(v
n+1
) = (2v
n
) (2s) (v
n+2
) 4 (v
n
) 4(s).
The interval (0, v
1
) again requires a special treatment: let at rst 0 < s
1
2
, then
(2s) = 2s = 2(s). For
1
2
< s < 1 we obtain
(2s) =
_
u
1
+
t
1
(2s v
1
) for v
1
2s v
t
1
u
t
1
+
tt
1
(2s v
t
1
) for v
t
1
2s v
2
.
We observe
(2s)
_
3s for v
1
2s v
t
1
4s for v
t
1
2s v
2
,
and hence in a similar way as above
(2s) 8(s) for s 0.
Thus and satisfy the
2
-condition. However
((1 +)u
n
)
(u
n
)
=
(u
t
n
)
(u
n
)
=
v
t
n
v
n
=
2
n1
+
1
2
2
n1
1.
Therefore condition b(ii) of Theorem 8.6.6 is violated. Let now (T, , ) be a non-
atomic measure space with (T) < , then by the Characterization Theorem of
Milne 8.6.6 (L
(), | |
) is not uniformly convex. But according to the Theo-

rem 8.6.10 (L
(), | |
()
) is not uniformly differentiable. On the other hand ap-
parently and are strictly convex and differentiable, by Theorem 7.7.1 L
()
reexive, and hence due to Theorem 8.5.22 | |
2
and | |
2
()
Frchet differentiable
and locally uniformly convex and have a strong minimum on every closed convex
subset K of L
resp. L
.
Remark 8.6.13. For the spaces L
p
() for arbitrary measure space (see [77])
X
() =
_
((1 +)
p
+[1 [
p
)
1
p
= (p 1)
2
/2 +O(
4
) for 2 p <
(1 +)
p
1 =
p
/p +O(
2p
) for 1 p 2
holds.
Maleev and Troyanski (see [83]) construct for a given Young function , satisfying
together with its conjugate a
2
-condition, an equivalent Young function in the
following way:
1
(t) :=
_
t
0
(u)
u
du and (t) :=
_
t
0
1
(u)
u
du. then also satises the
2
-condition and (L
(), | |
()
) is isomorphic to (L
(), | |
()
). They then show
Theorem 8.6.14. Let (T) = , where T contains a subset of innite measure free
of atoms, and let L
() be reexive. Then one obtains for convexity and smoothness

modules of X = (L
(), ||
()
) the following estimates:
X
() AF
(), [0, 1]
and
X
() BG
(), [0, 1].

Here F
and G
are dened in the following way:

F
() =
2
inf(uv)/u
2
(v) [ u [, 1], v R
+
() =
2
sup(uv)/u
2
(v) [ u [, 1], v R
+
.
For nite resp. purely atomic measures similar conditions are given.
It is easily seen: for (t) = [t[
p
one obtains (t) =
1
p
2
[t[
p
and
G
() =
_
2
for 2 p <
p
for 1 p 2
F
() =
_
p
for2 p <
2
for1 p 2.
8.6.2 Uniform Convexity of the Luxemburg Norm
Denition 8.6.15. A Young function we call -convex if for all a (0, 1) there is a
function : (0, 1) (0, 1), such that for all s > 0, and all b [0, a]
_
1 +b
2
s
_
(1 (a))
(s) +(b s)
2
holds. is called
-convex, if the above inequality holds in some interval (c, )

for c > 0, and
0
-convex, if the above inequality holds in some interval (0, c).
Remark 8.6.16. According to [3] the -convexity is equivalent to the following con-
dition: for all a (0, 1) there is a function : (0, 1) (0, 1), such that for all
s > 0
_
1 +a
2
s
_
(1 (a))
(s) +(a s)
2
holds. A corresponding statement holds for the
-convexity and
0
-convexity.
Remark 8.6.17. If is strictly convex and
-convex for s d > 0, then is
-convex for all c > 0.

Let 0 < c < d, then because of the strict convexity of
0 <
_
1 +a
2
s
_
<
(s) +(a s)
2
for all s [c, d]. Hence
0 < h(s) :=
(
1+a
2
s)
(s)+(as)
2
< 1.
h is continuous on [c, d] and attains its maximum there. Hence
t
(a) := max((a), 1 maxh(s) [ s [c, d]).
A corresponding statement is obtained for the
0
-convexity.
Theorem 8.6.18. Let (T, , ) be an arbitrary measure space. Then (L
, | |
()
)
is uniformly convex, if is -convex and satises the
2
-condition. If (T) < ,
then (L
, | |
()
) is uniformly convex, if is strictly convex and is
-convex and
satises the
2
-condition.
Proof. -convexity of is sufcient.
Case 1: let (T) = . We consider at rst the modular and show
> 0 q() : x, y with f
(x) = f
(y) = 1 and f
(x y) > (8.15)
f
_
x +y
2
_
< 1 q(). (8.16)
W.l.g. we can assume 0 < < 1. Let U = t T [ [x(t) y(t)[

4
max[x(t)[,
[y(t)[.
Let [s r[

4
maxs, r, then for s r 0 apparently s r +

4
s, hence
s(1

4
) r, thus r = b s with 0 b 1

4
=: a.
If, however r s 0, then correspondingly s = b r. We then obtain for
[s r[

4
maxs, r
_
s +r
2
_
_
1
_
1

4
__
(s) +(r)
2
,
for the modular this means
1 f
_
x +y
2
_
=
f
(x) +f
(y)
2
f
_
x +y
2
_
=
1
2
__
U
(x)d +
_
U
(y)d
+
_
T\U
(x)d +
_
T\U
(y)d
_
_
U
_
x +y
2
_
d
_
T\U
_
x +y
2
_
d
1
2
_
U
(x)d +
1
2
_
U
(y)d
_
U
_
x +y
2
_
d
1
2
_
U
(x)d +
1
2
_
U
(y)d
1 (1

4
)
2
__
U
(x)d +
_
U
(y)d
_
=

2
_
1

4
___
U
(x)d +
_
U
(y)d
_
=

2
_
1

4
_
(f
(x
U
) +f
(y
U
)).
Therefore
1 f
_
x +y
2
_

2
_
1

4
_
(f
(x
U
) +f
(y
U
)). (8.17)
Let now t T U, then [x(t) y(t)[ <

4
max([x(t)[, [y(t)[), hence due to the
monotonicity and convexity of the Young function: (x(t) y(t))

2
(
1
2
([x(t)[ +
[y(t)[)). We conclude
f
((x y)
T\U
)

2
f
_
1
2
([x[ +[y[)
T\U
_

4
(f
(x
T\U
) +f
(y
T\U
))

4
(f
(x) +f
(y))

2
.
If, as assumed, f
(x y) > , we obtain
f
((x y)
U
) = f
(x y) f
((x y)
T\U
) >

2
. (8.18)
Using the
2
-condition for and Inequality (8.17) we conclude
2
< f
((x y)
U
) f
(([x[ +[y[)
U
)
1
2
(f
(2x
U
) +f
(2y
U
))
1
2
((f
(x
U
) +f
(y
U
)))

2
2
(1

4
)
_
1 f
_
x +y
2
__
.
Solving this inequality for f
(
x+y
2
) we obtain
f
_
x +y
2
_
1

2
_
1

4
_
. (8.19)
Putting q() =

2
(1

4
) we arrive at Assertion (8.15).
Let now |x|
()
= |y|
()
= 1 and |x y|
()
> , then there is a > 0 with
f
(xy) > , for suppose there is a sequence (z

n
) with |z
n
|
()
and f
(z
n
)
0, then by Theorem 6.3.1 z
n
0, a contradiction. By Inequality (8.19) we then have
f
_
x +y
2
_
1 q().
As a next step we show: there is a > 0 with |
x+y
2
|
()
1 : for suppose there
are sequences x
n
and y
n
with |x
n
|
()
= |y
n
|
()
= 1 and f
(
x
n
+y
n
2
) 1 q() as
well as |
x
n
+y
n
2
|
()
1. If we put u
n
:=
x
n
+y
n
2
, then |u
n
|
()
1 and
1
|u
n
|
()
< 2
for n sufciently large. Putting z
n
:=
u
n
|u
n
|
()
it follows for such n
1 = f
(z
n
) = f
__
1
|u
n
|
()
1
_
2u
n
+
_
2
1
|u
n
|
()
_
u
n
_
_
1
|u
n
|
()
1
_
f
(2u
n
) +
_
2
1
|u
n
|
()
_
f
(u
n
)

_
1
|u
n
|
()
1
_
f
(u
n
) +
_
2
1
|u
n
|
()
_
f
(u
n
)
_
1
|u
n
|
()
1
_
+
_
2
1
|u
n
|
()
__
(1 q())
=
_
1 + ( 1)
_
1
|u
n
|
()
1
__
(1 q())
1
q()
2
,
a contradiction.
Case 2: Let now (T) < and let c > 0 be chosen, such that (2c)(T) <

8
.
Let
U =
_
t T
[x(t) y(t)[

4
max([x(t)[, [y(t)[) max([x(t)[, [y(t)[) c
_
.
Then as above for all t U
_
x(t) +y(t)
2
_
1 (1

4
)
2
((x(t)) +(y(t))),
and hence as above the Inequality (8.17) holds.
Let now t T U, then by denition [x(t) y(t)[ <

4
max([x(t)[, [y(t)[) or
max([x(t)[, [y(t)[) < c. Then the convexity and monotonicity of on R
+
imply
(x(t) y(t)) <
_
4
max([x(t)[, [y(t)[)
_

4
((x(t)) +(y(t))),
or
(x(t) y(t)) < (2 max([x(t)[, [y(t)[)) (2c).
Therefore
_
T\U
(x(t) y(t))d

4
__
T
(x(t))d +
_
T
(y(t))d
_
+(2c)(T)
=

2
+(2c)(T)
5
8
.
As above (see (8.18)) we then obtain
f
((x y)
U
) >
3
8
.
According to Remark 6.2.26 the
2
-condition for holds for s

8
c and hence
3
8
f
((x y)
U
) = f
_
2
x y
2

U
_
=
_
U
_
2
x(t) y(t)
2
_
d

_
U
_
x(t) y(t)
2
_
d

2
(f
(x
U
) +f
(y
U
))

2
2
(1

4
)
_
1 f
_
x +y
2
__
.
Finally we obtain as above
f
_
x +y
2
_
1
3
8
_
1

4
_
.
For non-atomic measures also the converse of the above theorem holds.
Theorem 8.6.19. Let (T, , ) be a non-atomic measure space. If then (T) <
and if (L
, | |
()
) is uniformly convex, then is
-convex and satises the
2
-
condition.
If (T) = and if (L
, | |
()
) is uniformly convex, then is strictly convex and
-convex and satises the
2
-condition.
Proof. Suppose d > 0 a (0, 1) > 0 u
_
u
+au
2
_
> (1 )
(u
) +(au
)
2
.
Let now d > 0 be chosen, such that (d)(T) 2. Then there are A
, B
with
(a) A
=
(b) (A
) = (B
)
(c) ((u
) +(au
))(A
) = 1.
If we put x
:= u
+au
and y
:= u
+au
, we obtain
f
_
x
1 a
_
= f
(u
(
A
)) = (u
)((A
) +(B
))
((u
) +(au
))(A
) = 1,
Section 8.7 Applications 293
and hence |x
|
()
1 a. On the other hand
f
_
x
+y
2(1 )
_
1
1
f
_
x
+y
2
_
=
1
1
f
_
u
(1 +a)
2
(
A
+
B
)
_
=
1
1
_
u
(1 +a)
2
_
((A
) +(B
))
(u
) +(au
)
2
2(A
) = ((u
) +(au
))(A
) = 1,
and hence |
x
+y
2
|
()
1 . If we put := 1 a then for all (0, 1) there are
functions x
, y
with |x
|
()
but |
x
+y
2
|
()
1 , where |x
|
()
=
|y
|
()
= 1, since f
(x
) = (u
)(A
) + (au
))(B
) = 1 = f
(y
) because
of properties (b) and (c).
The proof for (T) = is performed in a similar way (see [46]).
For sequence spaces (see [46]) the following theorem holds:
Theorem 8.6.20. (
, | |
()
) is uniformly convex, if and only if is
0
-convex and
strictly convex on (0, s
0
] with (s
0
) =
1
2
, and satises the
0
2
-condition.
8.7 Applications
As applications we discuss
Tikhonov regularization: this method was introduced for the treatment of ill-
posed problems, of which there are a whole lot. The convergence of the method
was proved by Levitin and Polyak for uniformly convex regularizing function-
als. We show here that locally uniformly convex regularizations are sufcient
for that purpose. As we have given a complete description of local uniform
convexity in Orlicz spaces we can state such regularizing functionals explicitly.
In this context we also discuss level uniformly convex regularizations as another
generalization of Levitin and Polyaks approach.
Ritz method: the Ritz method plays an important role in many applications
(e.g. in FEM-methods). It is well known that the Ritz procedure generates a
minimizing sequence. Actual convergence of the minimal solutions on each
subspace is only achieved if the original problem is strongly solvable.
Greedy algorithms have drawn a growing attention and experienced a rapid de-
velopment in recent years (see e.g. Temlyakov [104]). The aim is to arrive
at a compressed representation of a function in terms of its dominating fre-
quencies. The convergence proof makes use of the KadecKlee property of an
E-space.
It may be interesting to note that in the convergence proof of the Tikhonov regulariza-
tion method we make explicit use of local uniform convexity and the convergence of
the Ritz method follows from strong solvability, whereas the convergence proof of the
greedy algorithm follows from the KadecKlee property, so, in a way, three different
aspects of E-spaces come into play.
8.7.1 Regularization of Tikhonov Type
Locally Uniformly Convex Regularization
In this section we investigate the treatment of ill-posed problems by Tikhonovs meth-
od, using locally uniform convex regularizing functions on reexive Banach spaces.
At rst we observe that local uniform convexity of a convex function carries over to
the monotonicity of the subdifferential.
Lemma 8.7.1. Let X be a Banach space, f : X R a continuous locally uniformly
convex function, then for all x, y X and all x
f(x) and all y
f(y)
x,x
(|x y|) x y, x
,
if
x,x
denotes the convexity module belonging to f at x, x
.
Proof. As f is locally uniformly convex we have
x,x
(|x y|) +y x, x
f(y) f(x).
On the other hand the subgradient inequality yields: x y, y
f(x) f(y), in
total
x,x
(|x y|) +y x, x
y x, y
,
as claimed.
Theorem 8.7.2. Let X be a reexive Banach space and let f and g continuous,
Gteaux differentiable convex functions on X. Let further f be locally uniformly
convex, let K be a closed, convex subset of X and let S := M(g, K) ,= .
Let now (
n
)
nN
be a positive sequence tending to zero and f
n
:=
n
f + g. Let
nally x
n
be the (uniquely determined) minimal solution of f
n
on K, then the se-
quence (x
n
)
nN
converges to the (uniquely determined) minimal solutions of f on S.
Proof. According to Theorem 8.4.12 f
n
is locally uniformly convex and due to Re-
mark 8.4.2 f
n
is bounded, hence M(f
n
, K) consists of the unique element x
n
.
Let now x S, then because of the monotonicity of the derivative of g
x
n
x, g
t
(x
n
) g
t
(x)
. .
0
+
n
x
n
x, f
t
(x
n
)
= x
n
x,
n
f
t
(x
n
) +g
t
(x
n
)
. .
0
x
n
x, g
t
(x)
. .
0
0.
It follows
x
n
x, f
t
(x
n
) 0. (8.20)
Let now x S arbitrary, then
0 f
n
(x
n
) f
n
( x) = g(x
n
) g( x) +
n
(f(x
n
) f( x))
n
(f(x
n
) f( x)).
This implies f(x
n
) f( x), hence x
n
S
f
(f( x)) for all n N. As f
is bounded,
it follows according to Theorem 8.4.6 that the sequence (x
n
) is bounded. Let now
(x
k
) be a subsequence converging weakly to x
0
. Since K is weakly closed (see The-
orem 3.9.18) we have x
0
K.
First we show: y x
0
, g
t
(x
0
) 0 for y K arbitrary, i.e. x
0
S. We observe
y x
k
, g
t
(y) g
t
(x
k
)
. .
0
+
k
x
k
y, f
t
(x
k
)
= y x
k
, g
t
(y) +y x
k
, f
t
k
(x
k
)
. .
0
y x
k
, g
t
(y).
The expression x
k
y, f
t
(x
k
) is for xed y bounded from below because
x
k
y, f
t
(x
k
) = x
k
y, f
t
(x
k
) f
t
(y)
. .
0
+x
k
y, f
t
(y)
|f
t
(y)||x
k
y| C.
Hence we obtain
C
k

k
x
k
y, f
t
(x
k
) y x
k
, g
t
(y).
On the other hand the weak convergence of x
k
x
0
implies
y x
k
, g
t
(y)
k
y x
0
, g
t
(y),
and thus
y x
0
, g
t
(y) 0
for all y K. Let now 1 t > 0, z = y x
0
and y = x
0
+tz = ty +(1t)x
0
K,
then the continuity of t z, g
t
(x
0
+tz) =
d
dt
g(x
0
+tz) (see Lemma 3.8.2) implies:
0 z, g
t
(x
0
+tz)
t0
z, g
t
(x
0
), hence y x
0
, g
t
(x
0
) 0, i.e. x
0
S.
Now we show the strong convergence of (x
k
) to x
0
: due to Lemma 8.7.1 the weak
convergence and Inequality (8.20) yield
x
0
,f
(x
0
)
(|x
0
x
k
|) x
0
x
k
, f
t
(x
0
) f
t
(x
k
)
= x
0
x
k
, f
t
(x
0
) +x
k
x
0
, f
t
(x
k
)
. .
0
x
0
x
k
, f
t
(x
0
) 0,
hence x
k
x
0
.
It remains to be shown: x
0
is the minimal solution of f on S: because of the demi-
continuity (see Theorem 3.8.5) of f
t
it follows with Theorem 5.3.15 for x S
0 x x
k
, f
t
(x
k
) x x
0
, f
t
(x
0
),
and by the Characterization Theorem of Convex Optimization 3.4.3 the assertion of
the theorem since apparently x
n
x
0
because of the uniqueness of x
0
.
Remark. The proof of above theorem easily carries over to hemi-continuous mono-
tone operators (compare [55]).
As an application of the above theorem in Orlicz spaces we obtain
Theorem 8.7.3. Let (T, , ) be a -nite, essentially not purely atomic measure
space, let L
() be reexive, and let be strictly convex and differentiable. Let

either f := f
or f :=
1
2
| |
2
()
or f :=
1
2
| |
2
and let g : L
() R convex,
continuous, and Gteaux differentiable. Let K be a closed convex subset of L
()
and let S := M(g, K) ,= .
Let now (
n
)
nN
n
:=
n
f + g.
Let nally x
n
n
on K, then the
sequence (x
n
)
nN
converges to the (uniquely determined) minimal solution of f on S.
Proof. Theorem 8.7.2 together with Theorem 8.5.22.
Level-uniform Convex Regularization
Another generalization of the LevitinPolyak theorem is the regularization by level-
uniformly convex functionals.
Denition 8.7.4. Let X be a Banach space, then the function f : X R is called
level-uniformly convex, if for every r R there is a function
r
: R
0
R
0
such
that (0) = 0, (s) > 0 for s > 0 such that for all x, y S
f
(r)
f
_
x +y
2
_
1
2
f(x) +
1
2
f(y)
r
(|x y|).
It turns out that boundedness of the level sets is already an inherent property of
level-uniform convexity.
Lemma 8.7.5. The level sets S
f
(r) of the level-uniformly convex function f are
bounded.
Proof. The function f is bounded from below by on the unit ball B, since for
x
f(0) we have |x
| x 0, x
f(x) f(0) for all x B. Suppose

S
f
(r
0
) is not bounded for some r
0
> f(0). Then there exists a sequence (x
n
) where
|x
n
| = 1 and nx
n
S
f
(r
0
). Hence
f(nx
n
) 2f((n 1)x
n
) f((n 2)x
n
) + 2
r
0
(2).
Let := 2
r
0
(2). Then we obtain by induction for 2 k n
f(nx
n
) kf((n k + 1)x
n
) (k 1)f((n k)x
n
) +
k(k 1)
2
.
For k = n this results in the contradiction
r
0
f(nx
n
) nf(x
n
) (n 1)f(0) +
n(n 1)
2

n
_
f(0) +
n 1
2

_
+f(0)
n
.
Theorem 8.7.6. Let X be a reexive Banach space and let f and g continuous convex
functions on X. Let further f be level uniformly convex, let K be a closed, convex
subset of X and let S := M(g, K) ,= .
Let now (
n
)
nN
n
:=
n
f + g.
Let nally x
n
n
on K, then the
sequence (x
n
)
nN
converges to the (uniquely determined) minimal solution of f on S.
Proof. Let now x S be arbitrary, then
0 f
n
(x
n
) f
n
( x) = g(x
n
) g( x) +
n
(f(x
n
) f( x))
n
(f(x
n
) f( x)).
This implies f(x
n
) f( x) =: r, hence x
n
S
f
(f( x)) for all n N. Due to the
above lemma the sequence (x
n
) is bounded. Hence by Theorem 5.6.9 (the compact
topological space C now being S
f
(f( x))K, equipped with the weak topology) (x
n
)
converges weakly to the minimal solution x
0
of f on M(g, K) and f(x
n
) f(x
0
),
since according to Remark 3.11.8 f is weakly lower semi-continuous. Due to the
level-uniform convexity
r
(|x
n
x
0
|)
1
2
f(x
n
) +
1
2
f(x
0
) f
_
x
n
+x
0
2
_
.
But (
x
n
+x
0
2
) converges weakly to x
0
. Since f is weakly lower semi-continuous we
have f(
x
n
+x
0
2
) f(x
0
) for > 0 and n large enough, hence
r
(|x
n
x
0
|)
1
2
(f(x
n
) +f(x
0
)) f(x
0
) +,
and thus the strong convergence of (x
n
) to x
0
.
8.7.2 Ritzs Method
The following method of minimizing a functional on an increasing sequence of sub-
spaces of a separable space is well known and due to Ritz (compare e.g. [109]). Ritzs
method generates a minimizing sequence:
Theorem 8.7.7. Let X be a separable normed space, let X = span
i
, i N, and
let X
n
:= span
1
, . . . ,
n
. Let further f : X R be upper semi-continuous
and bounded from below. If d := inf f(X) and d
n
:= inf f(X
n
) for n N, then
lim
n
d
n
= d.
Proof. Apparently d
n
d
n+1
for all n N, hence d
n
a R. Suppose a > d. Let
ad
2
> > 0 and x X with f(x) d + . As f is upper semi-continuous, there
is a neighborhood U(x) with f(y) f(x) + for all y U(x), in particular there is
y
m
U(x) with y
m
X
m
. It follows that
a d
m
f(y
m
) f(x) + d + 2
a contradiction.
Corollary 8.7.8. Let d
n
:= inf f(X
n
), and (
n
)
nN
be a sequence of positive num-
bers tending to zero, and let x
n
X
n
chosen in such a way that f(x
n
) d
n
+
n
,
then (x
n
)
nN
is a minimizing sequence for the minimization of f on X.
For locally uniformly convex functions the convergence of the Ritzs method can
be established.
Theorem 8.7.9. Let X be a separable reexive Banach space with
X = span
i
, i N,
and let X
n
:= span
1
, . . . ,
n
. Let f and g be continuous convex function on
X. Let further f be locally uniformly convex. Let d := inf(f + g)(X) and d
n
:=
inf(f + g)(X
n
) for n N, let (
n
)
nN
be a sequence of positive numbers and let
x
n
X
n
be chosen such that f(x
n
) +g(x
n
) d
n
+
n
then the minimizing sequence
(x
n
)
nN
converges to the (uniquely determined) minimal solution of f +g on X.
Proof. Corollary 8.7.8 together with Theorems 8.4.12 and 8.4.11, and Corollary 8.4.7.
Remark 8.7.10. By the above theorem we are enabled to regularize the minimization
problem min(g, X) by adding a positive multiple f of a locally uniformly convex
function, i.e. one replaces the above problem by min(f + g, X) (compare Theo-
rem 8.7.2).
As an application of the above theorem in Orlicz spaces we obtain
Theorem 8.7.11. Let (T, , ) be a -nite, essentially not purely atomic measure,
let L
() separable and reexive, and let be strictly convex. Let L
() =
span
i
, i N and let X
n
:= span
1
, . . . ,
n
. Let either f := f
or f := ||
2
()
or f := | |
2
and let g : L
() R convex and continuous.

Let d := inf(f + g)(L
()) and d
n
:= inf(f + g)(X
n
) for n N, let (
n
)
nN
be
a sequence of positive numbers tending to zero and let x
n
X
n
be chosen such that
f(x
n
) + g(x
n
) d
n
+
n
, then the minimizing sequence (x
n
)
nN
converges to the
(uniquely determined) minimal solution of f +g on L
().
Proof. Theorem 8.7.9 and Theorem 8.5.22 .
8.7.3 A Greedy Algorithm in Orlicz Space
For a compressed approximate representation of a given function in L
2
[a, b] by har-
monic oscillations it is reasonable to take only the those frequencies with dominating
Fourier coefcients into account. This leads to non-linear algorithms, whose gener-
alization to Banach spaces was considered by V. N. Temlyakov. We will discuss the
situation in Orlicz spaces.
Denition 8.7.12. Let X be a Banach space, then D X is called a dictionary if
(a) || = 1 for all D
(b) from D it follows that D
(c) X = span(D).
We consider the following algorithm, which belongs to the class of non-linear m-
term algorithms (see [105]) and in [104] is denoted as Weak Chebyshev Greedy Algo-
rithm (WCGA) by V. N. Temlyakov:
Algorithm 8.7.13 (WGA). Let X strict convex and let = (t
k
)
kN
with 0 < t
k
1
for all k N. Let x X be arbitrary, and let F
x
S(X
) denote a functional with

F
x
(x) = |x|.
Let now x X 0 be given, then set r
0
:= x, and for m 1:
(a) Choose
m
D with
F
r
m1
(
m
) t
m
supF
r
m1
()[ D.
(b) For U
m
:= span
j
, j = 1, . . . , m let x
m
be the best approximation of x w.r.t.
U
m
.
(c) Set r
m
:= x x
m
and m m+ 1, goto (a).
If X is at convex, then F
x
is given by the gradient |x| of the norm at x (see
theorem of Mazur 8.1.3). Apparently: |F
x
| = 1.
Before we discuss the general situation, we will consider the case = (t) with
0 < t 1 and denote the corresponding algorithm by GA.
Convergence of GA in Frchet Differentiable E-Spaces
The subsequent exposition follows the line of thought of Temlyakov in [104]:
Let := d(x, span
i
, i N) and
n
:= |r
n
| = d(x, U
n
), then apparently
n
. We obtain
Lemma 8.7.14. Let X be at convex and > 0, then there is a > 0 and N N,
such that F
r
n
(
n+1
) > for all n N.
Proof. Since x
n
is a best approximation of x w.r.t. U
n
, we have F
r
n
(
i
) = 0 for i =
1, . . . , n and F
r
n
(r
n
) = |r
n
| (see Theorem3.12.6). Let nowD
1
:= D
i
, i N.
Since x span(D), there are x
t
span(D
1
), a N N, and a x
tt
span
i
N
i=1
=
U
N
, such that |x x
t
x
tt
| <

2
.
Let now n > N, then F
r
n
(
i
) = 0 for 1 i N, i.e. F
r
n
(x
tt
) = 0. Therefore we
obtain, due to
F
r
n
(r
n
) = |r
n
| = F
r
n
(x x
n
) = F
r
n
(x),
the subsequent inequality
F
r
n
(x
t
) = F
r
n
(x
t
+x
tt
) = F
r
n
(x (x x
t
x
tt
)) = F
r
n
(x) F
r
n
(x x
t
x
tt
)
F
r
n
(x) |x x
t
x
tt
|

2
=

2
.
For x
t
there is a representation as a linear combination of elements in D
1
: x
t
=
m
i=1
c
i
i
with
i
D
1
for i = 1, . . . , m. We obtain
2
F
r
n
(x
t
) =
m
i=1
c
i
F
r
n
(
i
)
m
i=1
[c
i
[ [F
r
n
(
i
)[
m max[c
i
[ [ i = 1, . . . , m max[F
r
n
(
i
)[ [ i = 1, . . . , m.
Therefore
max[F
r
n
(
i
)[ [ i = 1, . . . , m

2m max[c
i
[ [ i = 1, . . . , m
.
We nally have
F
r
n
(
n+1
) t supF
r
n
() [ D t max[F
r
n
(
i
)[ [ i = 1, . . . , m ,
if we choose := t

2mmax[c
i
[ [ i=1,...,m
.
Lemma 8.7.15. Let the normof X be Frchet differentiable, then for every convergent
subsequence (x
n
k
)
kN
of the sequence (x
n
)
nN
generated by GA
lim
k
x
n
k
= x
holds.
Proof. Suppose, there is a subsequence x
n
k
y ,= x. Then the norm is Frchet
differentiable at x y and therefore by Corollary 8.4.21 the duality mapping x F
x
is continuous at x y, i.e. F
xx
n
k
F
xy
. For arbitrary n N we then have
F
xy
(
n
) = lim
k
F
xx
n
k
(
n
) = 0,
and hence
F
xx
n
k
(
n
k
+1
) = (F
xx
n
k
F
xy
)(
n
k
+1
) |F
xx
n
k
F
xy
| 0. (8.21)
Since
n
k
it follows that = |y x| > 0.
But then also > 0, because assume = 0, then there is a subsequence u
m
k

U
m
k
with u
m
k
x. However
m
k
= |x
m
k
x| |u
m
k
x| 0, hence
m
k
0 = , a contradiction. Therefore > 0. By Lemma 8.7.14 we then obtain
F
xx
n
k
(
n
k
+1
) > > 0, a contradiction to (8.21).
Lemma 8.7.16. Let X be a strictly convex Banach space with KadecKlee property
(see Denition 8.4.19), whose norm is Frchet differentiable, then for the sequence
generated by GA the following relation holds:
x
n
x (x
n
)
nN
has a weakly convergent subsequence.
Proof. Let x
n
k
y, then by Theorem 3.9.18 y span
k
k=1
. Let r
n
:= x x
n
,
then |r
n
| 0. Apparently := d(x, span
k
k=1
). We will show now
= : let u
n
span
k
k=1
and let u
n
y, then |u
n
x| |y x|, hence
|y x| .
If = 0, then the assertion x
n
x already holds. Suppose now that > 0, then
we obtain
|y x| = F
xy
(x y) = lim
k
F
xy
(r
n
k
) lim
k
|r
n
k
| = .
The KadecKlee property then yields: r
n
k
y x, hence x
n
k
y. But by
Lemma 8.7.15 we then obtain x
n
k
x, contradicting |y x| > 0.
In the reexive case we then have the following
Theorem 8.7.17 (Convergence for GA). Let X be an E-space, whose norm is Frchet
differentiable. Then GA converges for every dictionary D and every x X.
Proof. Since X is reexive, the (bounded) sequence (x
n
) has a weakly convergent
subsequence. Lemma 8.7.16 together with Theorem 8.4.18 yields the assertion.
In Orlicz spaces the above theorem assumes the following formulation:
Theorem 8.7.18 (Convergence of GA in Orlicz spaces). Let (T, , ) be a -nite,
essentially not purely atomic measure space, let be a differentiable, strictly convex
Young function and its conjugate, and let L
() be reexive. The GA converges for

ever dictionary D and every x in L
().
Proof. is differentiable. Hence by Theorem 8.5.22 also (L
(), | |
()
) and
(L
(), | |
) resp. are Frchet differentiable, thus due to the theorem of Ander-

son 8.4.22 (L
(), | |
()
) and (L
(), | |
) resp. are E-spaces, whose norms are

by Theorem 8.5.22 Frchet differentiable.
Remark 8.7.19. Reexivity of L
() can depending on the measure be charac-

terized by appropriate
2
-conditions (see Theorem 7.7.1).
Convergence of WGA in Uniformly Differentiable Spaces
We will now investigate the convergence of WGA, if we drop the limitation t
m
=
. It turns out that this is possible in uniformly differentiable spaces under suitable
conditions on (t
m
) (see [103]):
For further investigations we need the following
Lemma 8.7.20. Let x X be given. Choose for > 0 a x
X, such that |x
x
| < and
x
A()
conv(D), then
sup
D
F
r
m1
()
1
A()
(|r
m1
| ).
If x conv(D) then
sup
D
F
r
m1
() |r
m1
|.
Proof. A() is determined as follows: if x conv(D), then choose A() = 1, oth-
erwise choose x
span(D), x
n
k=1
c
k
k
, where due to the symmetry of D the
coefcients c
k
, k = 1, . . . , n, can be chosen to be positive, then put A() :=
n
k=1
c
k
and hence x
/A() conv(D). Apparently

sup
D
F
r
m1
() = sup
conv(D)
F
r
m1
()
1
A()
F
r
m1
(x
)
holds.
Since x
m1
is the best approximation of x w.r.t. U
m1
, we obtain
F
r
m1
(x
) = F
r
m1
(x +x
x) = F
r
m1
(x) F
r
m1
(x x
) F
r
m1
(x)
= F
r
m1
(r
m1
+x
m1
) = F
r
m1
(r
m1
) = |r
m1
| .
The two previous considerations then imply
sup
D
F
r
m1
()
1
A()
(|r
m1
| ).
For x conv(D) we have A() = 1 and therefore
sup
D
F
r
m1
() |r
m1
|
for each > 0.
As a consequence we obtain
Lemma 8.7.21. Let X be uniformly differentiable with smoothness module (u). Let
x X be given. If one chooses for > 0 a x
X, such that |x x
| < and
x
A()
conv(D), then one obtains for > 0 arbitrary
|r
m
| |r
m1
|
_
1 + 2
_

|r
m1
|
_
t
m
1
A()
_
1

|r
m1
|
__
.
Proof. From the denition one immediately obtains for > 0 arbitrary and x, y X
with x ,= 0 and |y| = 1
2
_
1 +
_

|x|
__
1
|x|
(|x +y| +|x y|).
For x = r
m1
and y =
m
we then have
2
_
1 +
_

|r
m1
|
__
|r
m1
| |r
m1
+
m
| |r
m1

m
|.
Moreover, by step 1 of WGA using the previous lemma
F
r
m1
(
m
) t
m
sup
D
F
r
m1
() t
m
1
A()
(|r
m1
| ),
and thus
|r
m1
+
m
| F
r
m1
(r
m1
+
m
) = F
r
m1
(r
m1
) +F
r
m1
(
m
)
|r
m1
| +t
m
1
A()
(|r
m1
| ).
Altogether we conclude
|r
m
| |r
m1

m
|
2
_
1 +
_

|r
m1
|
__
|r
m1
|
t
m
1
A()
(|r
m1
| ) |r
m1
|
= |r
m1
|
_
1 + 2
_

|r
m1
|
_
t
m
1
A()
_
1

|r
m1
|
__
.
For the convergence proof of WGA we nally need the following
Lemma 8.7.22. Let 0
k
< 1 for k N and let
m
:= (1
1
) (1
2
) (1
m
). If
k=1
k
= , then lim
m
m
= 0 holds.
Proof. From the subgradient inequality for ln at 1 we have for x > 0: ln x x 1
and therefore ln(1
k
)
k
for k N, hence
ln(
m
) =
m
k=1
ln(1
k
)
m
k=1
k
,
and nally
m
= e
m
k=1

k
m
0.
Remark 8.7.23. If X is uniformly differentiable with smoothness module (u), then
for the (strictly monotone and continuous) function (u) :=
(u)
u
(see Lemma 8.6.2)
it follows that the equation
(u) = t
m
has for 0 <
1
2
a unique solution in (0, 2], since for |x| = 1 we have
(2)
1
2
_
1
2
(|x + 2x| +|x 2x|) 1
_
=
1
2
.
Theorem 8.7.24. Let X be uniformly differentiable with smoothness module (u)
and let for the solutions
m
of the equations (u) = t
m
for each (0,
1
2
] the
condition
m=1
t
m
m
=
be satised. Then WGA converges for arbitrary x X.
Proof. Suppose there is > 0 with |r
m
| . W.l.g. can be chosen, such that
:=

8A()
(0,
1
2
]. Then by Lemma 8.7.21 we conclude for arbitrary > 0
|r
m
| |r
m1
|
_
1 + 2
_
_
t
m
1
A()
_
1

__
.
For =
m
we obtain
|r
m
| |r
m1
|
_
1 + 2t
m
m
t
m
m
1
A()
_
1

__
= |r
m1
|
_
1 + 2t
m
m
8t
m
m
_
1

__
= |r
m1
|
_
1 2t
m
m
_
4
_
1

_
1
__
= |r
m1
|(1 2t
m
m
)
for =

2
. Using Lemma 8.7.22 we arrive at a contradiction.
Remark 8.7.25. Frequently, as in [103], estimates for the smoothness module are
available:
If
1
(u)
2
(u) and if
(i)
m
is solution of
i
(u) = t
m
for i = 1, 2, then apparently
(2)
m

(1)
m
.
Condition
m=1
t
m
(1)
m
= can then be replaced by
m=1
t
m
(2)
m
= .
The following theorem describes conditions for the convergence of WGA in Orlicz
spaces:
Theorem 8.7.26. Let (T, , ) be a non-atomic measure space. Let be differen-
tiable and let for each 0 < < 1/4 exist a constant 1 < R
N < , such
that
i. (2u) N(u)
ii.
t
+
((1 )u) <
1
R
t
+
(u)
for all u > 0
i. limsup
u
(2u)/(u) N
ii. limsup
u
+
(u)
+
((1)u)
> R
then (L
(), | |
()
) is uniformly differentiable. Let (u) be the corresponding
smoothness module and let for the solutions
m
m
for
each (0,
1
2
] be satised that
m=1
t
m
m
= .
Then WGA converges for arbitrary x (L
(), | |
()
).
Proof. Theorem 8.7.24 together with Theorem 8.6.10.
A corresponding theorem is available for the Orlicz norm, making use of the de-
scription of uniform convexity of the Luxemburg norm in Theorem 8.6.18.
Theorem8.7.27. Let (T, , ) be an arbitrary measure space. Then let be -convex
and satisfy the
2
-condition.
If (T) < , let be strictly convex and
-convex and let satisfy the
2
-
condition.
Then (L
(), | |
) is uniformly differentiable. Let (u) be the corresponding

smoothness module and let for the solutions
m
m
for each
(0,
1
2
] be satised that
m=1
t
m
m
= .
Then WGA converges for arbitrary x (L
(), | |
).
Proof. Theorem 8.7.24 together with Theorems 8.6.18 and 8.6.4.
If x conv(D), then by Lemma 8.7.20 we can replace step (a) of WGA by a con-
dition which is algorithmically easier to verify. In this way we obtain a modication
of WGA:
Algorithm 8.7.28 (WGAM). Let X be strictly convex and let = (t
k
)
kN
with
0 < t
k
1 for all k N.
Let further x conv(D) 0 be given, then put r
0
:= x, and for m 1:
(a) Choose
m
D such that
F
r
m1
(
m
) t
m
|r
m1
|.
(b) For U
m
:= span
j
[ j = 1, . . . , m let x
m
be the best approximation of x
w.r.t. U
m
.
(c) Put r
m
:= x x
m
and m m+ 1 goto (a).
According to Theorem 8.7.24 WGAM converges for arbitrary dictionaries and arbi-
trary x conv(D)0, since the inequality in Lemma 8.7.21 also holds for WGAM.
If X is separable, D can be chosen to be countable. The condition in step 1 of
WGAM is then satised after examining a nite number of elements of the dictionary.
Examples for separable Orlicz spaces we have mentioned in Section 7.8.1.
The speed of convergence of WGA resp. WGAM can be described by the subse-
quent recursive inequality:
Lemma 8.7.29. Let x conv(D). Let further (v) := v
1
(v), then
|r
m
| |r
m1
|
_
1 2
_
1
4
t
m
|r
m1
|
__
holds.
Proof. Let x conv(D), then A() = 1. Therefore by Lemma 8.7.21
|r
m
| |r
m1
|
_
1 + 2
_

|r
m1
|
_
t
m
_
1

|r
m1
|
__
for every > 0 and arbitrary > 0, i.e.
|r
m
| |r
m1
|
_
1 + 2
_

|r
m1
|
_
t
m
_
.
Let now
m
be solution of
1
4
t
m
|r
m1
| =
_

|r
m1
|
_
.
Then we obtain
|r
m
| |r
m1
|
_
1
1
2
m
t
m
_
,
and
1
_
1
4
t
m
|r
m1
|
_
=

m
|r
m1
|
,
i.e.
1
2
m
t
m
=
1
2
t
m
|r
m1
|
1
_
1
4
t
m
|r
m1
|
_
= 2
_
1
4
t
m
|r
m1
|
_
.
Remark 8.7.30. For
1

2
the inequality
2

1
holds.
In the case of Banach spaces with (u) au
p
for 1 < p 2 one obtains: (v)
bv
q
where q =
p
p1
. Temlyakov (see [103]) thereby obtains the estimate for WGA
|r
m
| c
_
1 +
m
k=1
t
q
k
_
1/q
.
Then for the special case GA apparently
|r
m
| c(1 +mt
q
)
1/q
c
t
m
1/q
holds.
Chapter 9
Variational Calculus
In this chapter we describe an approach to variational problems, where the solutions
appear as pointwise (nite dimensional) minima for xed t of the supplemented La-
grangian. The minimization is performed simultaneously w.r.t. to both the state vari-
able x and x, different from Pontryagins maximum principle, where optimization
is done only w.r.t. the x variable. We use the idea of the Equivalent Problems of
Carathodory employing suitable (and simple) supplements to the original minimiza-
tion problem. Whereas Carathodory considers equivalent problems by use of solu-
tions of the HamiltonJacobi partial differential equations, we shall demonstrate that
quadratic supplements can be constructed, such that the supplemented Lagrangian is
convex in the vicinity of the solution. In this way, the fundamental theorems of the
Calculus of Variations are obtained. In particular, we avoid any usage of eld theory.
Our earlier discussion of strong solvability in the previous chapter has some con-
nection to variational calculus: it turns out that if the LegendreRiccati differential
equation is satised the variational functional is uniformly convex w.r.t. the Sobolev
norm on a -neighborhood of the solution.
In the next part of this chapter we will employ the stability principles developed in
Chapter 5 in order to establish optimality for solutions of the EulerLagrange equa-
tions of certain non-convex variational problems.
The principle of pointwise minimization is then applied to the detection of a
smooth, monotone trend in time series data in a parameter-free manner. In this context
we also employ a Tikhonov-type regularization.
The last part of this chapter is devoted to certain problems in optimal control,
where, to begin with, we put our focus on stability questions. In the nal section
we treat a minimal time problem which turns out to be equivalent to a linear ap-
proximation in the mean, and thus closing the circle of our journey through function
spaces.
9.1 Introduction
If a given function has to be minimized on a subset (restriction set) of a given set, one
can try to modify this function outside of the restriction set by adding a supplement in
such a way that the global minimal solution of the supplemented function lies in the
restriction set. It turns out that this global minimal solution is a solution of the original
(restricted) problem. The main task is to determine such a suitable supplement.
310 Chapter 9 Variational Calculus
Supplement Method 9.1.1. Let M be an arbitrary set, f : M R a function and T
a subset of M. Let : M R be a function, that is constant on T. Then we obtain:
If x
0
T is a minimal solution of the function f + on all of M, then x
0
is a
minimal solution of f on T.
Proof. For x T arbitrary we have
f(x
0
) +(x
0
) f(x) +(x) = f(x) +(x
0
).
Denition 9.1.2 (Piecewise continuously differentiable functions). We consider func-
tions x C[a, b]
n
for which there exists a partition a = t
0
< t
1
< < t
m
= b,
such that: x is continuously differentiable for each i 1, . . . , m on [t
i1
, t
i
) and
the derivative x has a left-sided limit in t
i
. The value of the derivative of x in b is then
dened as the left-sided derivative in b. Such a function we call piecewise continu-
ously differentiable. The set of these functions is denoted by RCS
(1)
[a, b]
n
.
Denition 9.1.3. Let W be a subset of R
[a, b]. We call G : W R

k
piecewise
continuous, if there is a partition Z = a = t
0
< t
1
< < t
m
= b of [a, b] such
that for each i 1, . . . , m the function G : W (R
[t
i1
, t
i
)) R
k
has a
continuous extension M
i
: W (R
[t
i1
, t
i
]) R
k
.
In the sequel we shall consider variational problems in the following setting: let
U R
2n+1
, be such that U
t
:= (p, q) R
2n
[ (p, q, t) U ,= for all t [a, b],
let L : U R be piecewise continuous. The restriction set S for given , R
n
is
a set of functions that is described by
S x RCS
1
[a, b]
n
[ (x(t), x(t), t), (x(t), x(t), t) U t [a, b],
x(a) = , x(b) = .
The variational functional f : S R to be minimized is dened by
f(x) =
_
b
a
L(x(t), x(t), t)dt.
The variational problem with xed endpoints is then given by
Minimize f on S.
The central idea of the subsequent discussion is to introduce a supplement in inte-
gral form that is constant on the restriction set. This leads to a new variational problem
with a modied Lagrangian. The solutions of the original variational problemcan now
be found as minimal solutions of the modied variational functional. Because of the
monotonicity of the integral, the variational problem is now solved by pointwise min-
imization of the Lagrangian w.r.t. the x and x variables for every xed t, employing
the methods of nite-dimensional optimization.
This leads to sufcient conditions for a solution of the variational problem. This
general approach does not even require differentiability of the integrand. Solutions
of the pointwise minimization can even lie at the boundary of the restriction set so
that the EulerLagrange equations do not have to be satised. For interior points the
EulerLagrange equations will naturally appear by setting the partial derivatives to
zero, using a linear supplement potential.
9.1.1 Equivalent Variational Problems
We now attempt to describe an approach to variational problems that uses the idea
of the Equivalent Problems of Carathodory (see [17], also compare Krotov [74])
employing suitable supplements to the original minimization problem. Carathodory
constructs equivalent problems by use of solutions of the HamiltonJacobi partial
differential equations. In the context of Bellmans Dynamic Programming (see [6])
this supplement can be interpreted as the so-called value function. The technique
to modify the integrand of the variational problem already appears in the works of
Legendre in the context of the second variation (accessory problem).
In this treatise we shall demonstrate that explicitly given quadratic supplements are
sufcient to yield the main results.
Denition 9.1.4. Let F : [a, b] R
n
R with (t, x) F(t, x) be continuous, w.r.t.
x continuously partially differentiable and w.r.t. t piecewise partially differentiable in
the sense that the partial derivative F
t
is piecewise continuous. Moreover, we require
that the partial derivative F
xx
exists and is continuous, and that F
tx
, F
xt
exist in the
piecewise sense, and are piecewise continuous such that F
tx
= F
xt
.
Then we call F a supplement potential.
Lemma 9.1.5. Let F : [a, b] R
n
R be a supplement potential. Then the integral
over the supplement
_
b
a
[F
x
(t, x(t)), x(t) +F
t
(t, x(t))]dt
is constant on S.
Proof. Let Z
1
be a common partition of [a, b] for F and x, i.e. Z
1
= a =

t
0
<

t
1
<
<

t
j
= b, such that the requirement of piecewise continuity for x and F
t
are
satised w.r.t. Z
1
, then
_
b
a
[F
x
(t, x(t)), x(t) +F
t
(t, x(t))]dt
=
j
i=1
_

t
i
t
i1
[F
x
(t, x(t)), x(t) +F
t
(t, x(t))]dt
=
j
i=1
(F(
t
i
, x(
t
i
)) F(
t
i1
, x(
t
i1
)))
= F(b, ) F(a, ).
An equivalent problem is then given through the supplemented Lagrangian

L
L := L F
x
, x F
t
.
9.1.2 Principle of Pointwise Minimization
The aim is to develop sufcient criteria for minimal solutions of the variational prob-
lem by replacing the minimization of the variational functional on subsets of a func-
tion space by nite dimensional minimization. This can be accomplished by point-
wise minimization of an explicitly given supplemented integrand for xed t using
the monotonicity of the integral (application of this method to general control prob-
lems was treated in [63]). The minimization is done simultaneously, with respect
to the x- and the x-variables in R
2n
. This is the main difference as compared to
Hamilton/Pontryagin where minimization is done solely w.r.t. the x-variables, meth-
ods, which lead to necessary conditions in the rst place.
The principle of pointwise minimization is demonstrated in [61], where a complete
treatment of the brachistochrone problem is presented using only elementary mini-
mization in R (compare also [59], p. 120 or [57]). For a treatment of this problem
using elds of extremals see [33], p. 367.
Our approach is based on the following obvious
Lemma 9.1.6. Let A be a set of integrable real functions on [a, b] and let l
A. If
for all l A and all t [a, b] we have l
(t) l(t) then

_
b
a
l
(t)dt
_
b
a
l(t)dt
for all l A.
Theorem 9.1.7 (Principle of Pointwise Minimization). Let a variational problem
with Lagrangian L and restriction set S be given.
If for an equivalent variational problem
Minimize g(x) :=
_
b
a
L(x(t), x(t), t)dt,

where
L = L F
x
, x F
t
,
an x
S can be found, such that for all t [a, b] the point (p

t
, q
t
) := (x
(t), x
(t)) is
a minimal solution of the function (p, q)

L(p, q, t) =:
t
(p, q) on U
t
:= (p, q)
R
2n
[ (p, q, t) U.
Then x
is a solution of the original variational problem.

Proof. For the application of Lemma 9.1.6, set A = l
x
: [a, b] R[ t l
x
(t) =
L(x(t), x(t), t), x S and l
:= l
x
. According to Lemma 9.1.5 the integral over
the supplement is constant.
It turns out (see below) that the straightforward approach of a linear (w.r.t. x) sup-
plement already leads to the EulerLagrange equation by setting the partial derivatives
of

L (w.r.t. p and q) to zero.
9.1.3 Linear Supplement
In the classical theory a linear supplement is used where the supplement potential F
has the structure
(t, x) F(t, x) = (t), x, (9.1)
and RCS
1
[a, b]
n
is a function that has to be determined in a suitable way.
As F
x
(t, x) = (t) and F
t
(t, x) =
(t), x we obtain for the equivalent problem

Minimize g(x) =
_
b
a
L(x(t), x(t), t) (t), x(t)
(t), x(t)dt on S. (9.2)

We shall now attempt to solve this problem through pointwise minimization of the
integrand. This simple approach already leads to a very efcient method for the treat-
ment of variational problems.
For the approach of pointwise minimization we have to minimize
(p, q)
t
(p, q) := L(p, q, t) (t), q
(t), p
on U
t
.
Let W be an open superset of U in the relative topology of R
n
R
n
[a, b], and
let L : W R be piecewise continuous and partially differentiable w.r.t. p and q
(in the sense that L
p
and L
q
are piecewise continuous). If for xed t [a, b] a point
(p
t
, q
t
) Int U
t
is a corresponding minimal solution, then the partial derivatives of
t
have to be equal to zero at this point. This leads to the equations
L
p
(p
t
, q
t
, t) =

(t) (9.3)
L
q
(p
t
, q
t
, t) = (t). (9.4)
The pointwise minimum (p
t
, q
t
) yields a function t (p
t
, q
t
). It is our aim to show
that this pair provides a solution x
of the variational problem where x
(t) := p
t
and
x
(t) = q
t
. In the spirit of the supplement method this means that the global minimum
is an element of the restriction set S. The freedom of choosing a suitable function
is exploited to achieve this goal.
Denition 9.1.8. A function x
RCS
1
[a, b]
n
is called an extremaloid (see Hestenes
[38], p. 60), if it satises the EulerLagrange equation in integral form, i.e. there is a
constant c such that:
_
t
a
L
x
(x(), x(), )d +c = L
x
(x(t), x(t), t) t (a, b]. (9.5)
If the extremaloid x
is a C
1
[a, b]
n
-function then x
is called an extremal. An ex-

tremaloid x
is called admissible if x
S.
Remark 9.1.9. An extremaloid x
always satises the WeierstrassErdmann condi-

tion, i.e.
t L
x
(x
(t), x
(t), t) is continuous.
For an extremaloid x
the denition
(t) :=
_
t
a
L
x
(x
(), x
(), )d +c (9.6)
(c a constant) leads to a RCS
1
.
A fundamental question of variational calculus is, under what conditions an ex-
tremaloid is a minimal solution of the variational problem.
If x
is an admissible extremaloid then for every t [a, b] the pair (x
(t), x
(t))
Int U
t
satises the rst necessary condition for a pointwise minimum at t. From the
perspective of pointwise minimization we can state the following: if setting the partial
derivatives of the integrand equal to zero leads to a pointwise (global) minimum on
U
t
for every t [a, b], then indeed x
is a solution of the variational problem. The

principle of pointwise minimization also provides a criterion to decide which of the
extremaloids is the global solution.
For convex integrands, an extremaloid already leads to the sufcient conditions for
a pointwise minimum.
If for every t [a, b] the set U
t
is convex and the Lagrangian L(, , t) : U
t
R
is a convex function, then an admissible extremaloid is a solution of the variational
problem. This remains true if the extremaloid lies partially on the boundary of U
(i.e. if the pointwise minimum lies on the boundary of U
t
). As we require continuous
differentiability in an open superset of U
t
the vector of the partial derivatives (i.e.
the gradient) represents the (total) derivative, all directional derivatives are equal to
zero at (x
(t), x
(t)). The Characterization Theorem of Convex Optimization (see

Theorem 3.4.3) guarantees that this point is indeed a pointwise minimum.
Theorem 9.1.10. Let X R
n
R
n
be open and : X Rbe differentiable. Let K
be a convex subset of X and : K Rbe convex. If for x
K we have
t
(x
) = 0
then x
is a minimal solution of on K.
We summarize this situation in the following:
Theorem 9.1.11. For convex problems every admissible extremaloid is a solution of
the variational problem.
In view of this theorem it turns out that the method of pointwise minimization can
be extended to a much larger class of variational problems, where the integrand can
be convexied by use of a suitable supplement (see Section 9.3).
The invariance property stating that equivalent problems have the same ex-
tremaloids, established in the subsequent theorem, leads to the following principle:
An extremaloid of a problem that is convexiable is a solution of the orig-
inal variational problem. In particular, the explicit convexication does
not have to be carried out, instead the solvability of certain ordinary dif-
ferential equations has to be veried
(see below).
Theorem 9.1.12. Every extremaloid for the Lagrangian L is an extremaloid for the
supplemented Lagrangian
L := L F
x
, x F
t
and vice versa, where F is a supplement potential.
Proof. We have
L
x
= L
x
F
x
,
and
L
x
= L
x
x
T
F
xx
F
tx
.
Moreover
d
dt
F
x
(t, x(t)) = F
xt
(t, x(t)) + x(t)
T
F
xx
(t, x(t)).
If x satises the EulerLagrange equation in integral form w.r.t. L, i.e.
L
x
=
_
t
a
L
x
d +c,
then there is a constant c such that
L
x
=
_
t
a
L
x
d + c,
which we shall now show, using the continuity of F
x
_
t
a
L
x
d =
_
t
a
(L
x
x
T
F
xx
F
tx
)d = L
x
c
_
t
a
( x
T
F
xx
+F
xt
)d
= L
x
F
x
+F
x
(a, x(a)) c =

L
x
c.
As equivalent variational problems have the same extremaloids Theorem 9.1.11
also holds for convexiable problems. The following theorem can be viewed as a
central guideline for our further considerations:
Theorem9.1.13. If, for a given variational problem, there exists an equivalent convex
problem, then every admissible extremaloid is a minimal solution.
9.2 Smoothness of Solutions
Theorem 9.2.1. Let L : U R be such that L
q
continuous. Let x S be such that
(a) V
t
:= q R
n
[ (x(t), q, t) U is convex for all t [a, b].
(b) L(x(t), , t) be strictly convex on V
t
for all t [a, b].
(c) : [a, b] R
n
where t (t) := L
q
(x(t), x(t), t) is continuous.
Then x is smooth (i.e. x C
1
[a, b]
n
).
Proof. Let q
t
(q) := L(x(t), q, t) (t), q, Then
t
is strictly convex on V
t
.
We have
t
t
( x(t)) = L
q
(x(t), x(t), t) (t) = 0
for all t [a, b], hence x(t) is the unique minimal solution of
t
on V
t
. Let (t
k
) be
a sequence in [a, b] with t
k
t. Then there exists an interval I
t
:= [
t, t) and K N
such that t
k
I
t
for k > K and we have L
q
(x, x, ) and x are continuous on I
t
. We
obtain
0 = L
q
(x(t
k
), x(t
k
), t
k
) (t
k
) L
q
(x(t), x(t), t) (t).
Hence x(t) is minimal solution of
t
on V
t
. As
t
is strictly convex, we nally
obtain: x(t) = x(t) which proves the theorem.
Corollary 9.2.2. Let x
be an extremaloid, and let

(a) V
t
:= q R
n
[ (x
(t), q, t) U be convex for all t [a, b]

(b) L(x
(t), , t) be strictly convex on V

t
.
Then x
is an extremal.
Proof. : [a, b] R
n
with t (t) := L
q
(x
(t), x
(t), t) is continuous (Weier-

strassErdmann condition is satised).
Section 9.2 Smoothness of Solutions 317
The following example shows that the above theorem does not hold if the strict
convexity of L(x
(t), , t) is violated:
Example 9.2.3. Let L(p, q, t) := ((t
1
2
)
+
)
2
q +
1
2
p
2
for t [0, 1]. We observe: L is
convex but not strictly convex w.r.t. q.
We want to consider the corresponding variational problem for the boundary con-
ditions x(0) = 0, x(1) = 1.
For the EulerLagrange equation we obtain
d
dt
__
t
1
2
_
+
_
2
= x x
(t) = 2
_
t
1
2
_
+
,
i.e. x
is not smooth.
On the other hand (t) = L
q
(t) = ((t
1
2
)
+
)
2
is continuous and hence the
WeierstrassErdmann condition is satised. Thus x
is extremaloid of the convex

variational problem and hence a minimal solution (Theorem 9.1.12) . The function
t
is not strictly convex
t
(q) =
__
t
1
2
_
+
_
2
q +
1
2
p
2
__
t
1
2
_
+
_
2
q =
1
2
p
2
i.e.
t
is constant w.r.t. q, which means that every q is a minimal solution of
t
. In
particular the set of minimal solutions of
t
is unbounded (beyond being non-unique).
Denition 9.2.4. Let x
RCS
1
[a, b]
n
be an extremaloid, and let L
0
x x
(t) :=
L
x x
(x
(t), x
(t), t) satisfy the strong LegendreClebsch condition, i.e. L

0
x x
is pos-
itive denite on [a, b], then x
is called a regular extremaloid.

Remark 9.2.5. The LegendreClebsch condition, i.e. L
0
x x
positive semi-denite on
[a, b], is a classical necessary condition for a minimal solution of the variational prob-
lem (see [38]).
We would like to point out that a regular extremaloid is not always an extremal, as
can be seen in the following example:
Example 9.2.6. Consider the variational problem on the interval [2, 2], with the
boundary conditions x
(2) = x
(2) = 0 given by the Lagrangian being dened

by (p, q, t) L(p, q, t) := cos q +
t
q for > 2 and for (p, q, t) U =

R (
3
2
,
3
2
) [2, 2]. Then L
q
= sin q +
t
and L
qq
= cos q, the latter
being positive for

2
< [q[ <
3
2
. For the EulerLagrange equation we obtain
d
dt
_
sin x +
t
_
= 0,
i.e.
sin x +
t
= c.
In particular, any solution of the above equation satises the WeierstrassErdmann
condition.
Choosing x
to be even (i.e. x
odd), we obtain according to Equation (9.5)

c =
1
b a
_
b
a
L
x
(x
(t), x
(t), t)dt =
1
4
_
2
2
_
sin( x
(t)) +
t
_
dt = 0,
and hence we have to solve
sin q =
t
,
such that

2
< [q[ <
3
2
in order to satisfy the strong LegendreClebsch condition.
We obtain two different solutions:
1. For 2 t > 0 and

2
< q <
3
2
we obtain
q(t) = arcsin
_
t
_
+,
and for 2 t < 0 and
2
> q >
3
2
q(t) = arcsin
_
t
_
.
2. For 2 t > 0 and
2
> q >
3
2
we obtain
q(t) = arcsin
_
t
_
,
and for 2 t < 0 and

2
< q <
3
2
q(t) = arcsin
_
t
_
+.
Obviously, both solutions are discontinuous at t = 0. The extremals are then ob-
tained via integration.
1. For t > 0
x
(t) = t
_
arcsin
_
t
_
+
_
1
_
t
_
2
D
1
,
and for t < 0
x
(t) = t
_
arcsin
_
t
1
_
t
_
2
D
1
,
Section 9.2 Smoothness of Solutions 319
where D
1
= 2(arcsin(
2
) +)
_
1 (
2
)
2
according to the boundary
conditions.
2. For t > 0
x
(t) = t
_
arcsin
_
t
1
_
t
_
2
D
2
,
and for t < 0
x
(t) = t
_
arcsin
_
t
_
+
_
1
_
t
_
2
D
2
,
where D
2
= 2(arcsin(
2
) )
_
1 (
2
)
2
. Both extremaloids are
admissible and both satisfy the strong Legendre condition.
Pointwise Minimization. The method of pointwise minimization (w.r.t. p, q for xed
t) leads directly to a global solution of the variational problem. For the linear sup-
plement choose the DuboisReymond form (t) =
_
t
2
L
x
d + c = c. Let
t
:
R(
3
2
,
3
2
) Rwith
t
(p, q) := L(p, q, t)+(t)q+

(t)p = cos q+
t
q+cq =:
t
(q), where
t
: (
3
2
,
3
2
) R. Choosing x
to be even (i.e. x
odd), we obtain
as above (see Equation (9.5))
c =
1
b a
_
b
a
L
x
(x
(t), x
(t), t)dt =
1
4
_
2
2
_
sin( x
(t)) +
t
_
dt = 0.
For t > 0 and q > 0 we obtain:
t
(q) > 1 and for
t
() = cos()
t
, hence
we must look for minimal solutions on (
3
2
,
2
). On this interval
t
is convex.
The necessary condition yields
sin q =
t
,
for which we obtain the following solution: let q = + r where r [
2
,

2
] then
sin q = sin( +r) = sin r =
t
hence r = arcsin
t
.
For t < 0 for analogous reasons minimal solutions are found on (
2
,
3
2
). Hence,
for 2 t > 0 and
2
> q >
3
2
we obtain
q(t) = arcsin
_
t
_
,
and for 2 t < 0 and

2
< q <
3
2
q(t) = arcsin
_
t
_
+.
As
t
does not depend on p, every pair (p, q(t)) is a pointwise minimal solution of
t
on R (
3
2
,
3
2
). We choose
x
(t) = p(t) =
_
t
2
q()d.
x
is in RCS
1
[2, 2], even, and satises the boundary conditions. According to
the principle of pointwise minimization, x
is the global minimal solution of the vari-

ational problem.
We would like to point out that Carathodory in [17] always assumes that x
is
smooth.
9.3 Weak Local Minima
For our subsequent discussion that is based on a convexication of the Lagrangian we
need the following
Lemma 9.3.1. Let V be an open and A a compact subset of a metric space (X, d),
and let A V . Then there is a positive such that
_
xA
K(x, ) V.
Proof. Suppose for all n N there exists x
n
X V and a
n
A such that
d(x
n
, a
n
) <
1
n
. As A compact there is a convergent subsequence (a
k
) such that
a
k
a A V . We obtain
d(x
k
, a) d(x
k
, a
k
) +d(a
k
, a) 0,
a contradiction to X V closed.
Lemma 9.3.2. Let M L(R
2n
) be a matrix of the structure
M =
_
A C
T
C D
_
where A, C, D L(R
n
) and D positive denite and symmetric. Then M is positive
(semi-)denite if and only if AC
T
D
1
C is positive (semi-)denite.
Proof. Let f : R
n
R
n
R dened by
f(p, q) :=
_
p
q
_
T
_
A C
T
C D
__
p
q
_
= p
T
Ap + 2q
T
Cp +q
T
Dq.
Section 9.3 Weak Local Minima 321
Minimization w.r.t. q for xed p yields: 2Dq = 2Cp and hence
q(p) = D
1
Cp.
By inserting this result into f we obtain
f(p, q(p)) = p
T
Ap 2p
T
C
T
D
1
Cp +p
T
C
T
D
1
Cp = p
T
(AC
T
D
1
C)p.
Using our assumption AC
T
D
1
C positive (semi-)denite it follows that f(p, q(p))
> 0 for p ,= 0 (f(p, q(p)) 0 in the semi-denite case). For p = 0 and q ,= 0
obviously f(p, q) > 0.
On the other hand, let W be positive (semi-)denite. Then (0,0) is the only (a)
minimal solution of f. Hence the function p f(p, q(p)) has 0 as the (a) minimal
solution, i.e. AC
T
D
1
C is positive (semi-)denite.
Denition 9.3.3. We say that the LegendreRiccati condition is satised, if there ex-
ists a continuously differentiable symmetrical matrix function W : [a, b] L(R
n
)
such that for all t [a, b] the expression
L
0
xx
+

W (L
0
x x
+W)(L
0
x x
)
1
(L
0
xx
+W) (9.7)
is positive denite.
We would like to point out that Zeidan (see [116]) treats variational problems by
discussing the Hamiltonian. She presents a corresponding condition for W w.r.t. the
Hamiltonian in order to obtain sufcient conditions.
If the LegendreRiccati condition is satised, we shall introduce a quadratic sup-
plement potential based on the corresponding matrix W such that the supplemented
Lagrangian is strictly convex (compare also [33], Vol. I, p. 251). Kltzler in [18]
p. 325 uses a modication of the Hamiltonian, also leading to Riccatis equation, such
that the resulting function is concave, in the context of extensions of eld theory.
Let in the sequel U be open.
Theorem 9.3.4 (Fundamental Theorem). Let L : U R be continuous and L(, , t)
twice continuously differentiable.
An admissible regular extremaloid x
is a weak local minimal solution of the given

variational problem, if the LegendreRiccati condition is satised.
Proof. Let a differentiable W : [a, b] L(R
n
) be given, such that the Legendre
Riccati condition is satised for W. Then we choose the quadratic supplement poten-
tial
F : [a, b] R
n
R
with F(t, p) =
1
2
p
T
W(t)p which leads to an equivalent variational problem with
the modied Lagrange function
L(p, q, t) := L(p, q, t) q, F
p
(t, p) F
t
(t, p)
= L(p, q, t) +q, W(t)p +
1
2
p,

W(t)p.
We shall now show that there is a > 0 such that for all t [a, b] the function
(p, q)
t
(p, q) :=

L(p, q, t) is strictly convex on K
t
:= K((x
(t), x
(t)), ): we
have
M :=
tt
t
(x
(t), x
(t)) =
_
L
0
pp
+

W L
0
pq
+W
L
0
qp
+W L
0
qq
_
(t)
is positive denite using the LegendreRiccati condition and Lemma 9.3.2 (note that
L
0
qp
= L
0
pq
T
). Then
tt
t
(p, q) is positive denite on an open neighborhood of (x
(t),
x
(t)). As the set (x
(t), x
(t)) (x
(t), x
(t))[t [a, b] is compact there is

according to Lemma 9.3.1 a (universal) such that on K
t
the function
tt
t
is positive
denite, and therefore
t
strictly convex on K
t
, for all t [a, b]. As the RCS
1
-ball
with center x
and radius is contained in the set

S
:= x RCS
1
[a, b]
n
[ (x(t), x(t)) K
t
t [a, b],
we obtain that the extremal x
is a (proper) weak local minimum. Thus we have iden-

tied a (locally) convex variational problem with the Lagrangian

L that is equivalent
to the problem involving L (Theorems 9.1.11, 9.1.12) .
Remark 9.3.5. The LegendreRiccati condition is guaranteed if the LegendreRiccati
Matrix Differential Equation
L
0
xx
+

W (L
0
x x
+W)(L
0
x x
)
1
(L
0
xx
+W) = cI
for a positive c Rhas a symmetrical solution on [a, b]. Using the notation R := L
0
x x
,
Q := L
0
xx
, P := L
0
xx
and A := R
1
Q, B := R
1
, C := P Q
T
R
1
QcI then
for V := W the above equation assumes the equivalent form (LegendreRiccati
equation)
V +V A+A
T
V +V BV C = 0.
Denition 9.3.6. The rst order system
Z = CY A
T
Z
Y = AY +BZ
is called the Jacobi equation in canonical form. The pair of solutions (Z, Y ) is called
self-conjugate if Z
T
Y = Y
T
Z.
Section 9.3 Weak Local Minima 323
For the proof of the subsequent theorem we need the following
Lemma 9.3.7 (Quotient Rule).
d
dt
(A
1
(t)) = A
1
(t)

A(t)A
1
(t)
d
dt
(B(t)A
1
(t)) =

B(t)A
1
(t) B(t)A
1
(t)

A(t)A
1
(t)
d
dt
(A
1
(t)B(t)) = A
1
(t)

B(t) A
1
(t)

A(t)A
1
(t)B(t).
Proof. We have
0 =
d
dt
(I) =
d
dt
(A(t)A
1
(t)) =

A(t)A
1
(t) +A(t)
d
dt
(A
1
(t)).
Theorem 9.3.8. If the Jacobi equation has a solution (Z, Y ) such that Y
T
Z is sym-
metrical and Y is invertible then V := ZY
1
is a symmetrical solution of the
LegendreRiccati equation
V +V A+A
T
V +V BV C = 0.
Proof. If Y
T
Z is symmetrical, then V := ZY
1
is also symmetrical, as
(Y
T
)
1
(Y
T
Z)Y
1
= ZY
1
= V.
V is a solution of the LegendreRiccati equation, because according to the quotient
rule (Lemma 9.3.7) we have
V =

ZY
1
ZY
1

Y Y
1
= (CY A
T
Z)Y
1
ZY
1
(AY +BZ)Y
1
= C A
T
V V AV BV.
Denition 9.3.9. Let x
be an extremaloid and let A, B, C as in Remark 9.3.5. Let

(x, y) be a solution of the Jacobi equation in vector form
z = Cy A
T
z
y = Ay +Bz,
such that y(a) = 0 and z(a) ,= 0. If there is a point t
0
(a, b] such that y(t
0
) = 0
then we say t
0
is a conjugate point of x
w.r.t. a.
For the next theorem compare Hartmann ([36], Theorem 10.2 on p. 388).
Theorem 9.3.10. If the regular extremaloid x
does not have a conjugate point in

(a, b] then there exists a self-conjugate solution (Z
1
, Y
1
) of the matrix Jacobi equation
such that Y
1
is invertible on [a, b].
Furthermore, the LegendreRiccati condition is satised.
Proof. For any solution (Z, Y ) of the matrix Jacobi equation
Z = CY A
T
Z
Y = AY +BZ
it turns out that
d
dt
(Z
T
Y Y
T
Z) = 0 using the product rule and the fact that the
matrices B and C are symmetrical. Hence the matrix Z
T
Y Y
T
Z is equal to a
constant matrix K on [a, b]. If we consider the following initial value problem:
Y (a) = I, Y
0
(a) = 0
Z(a) = 0, Z
0
(a) = I
then obviously for the solution (Z
0
, Y
0
) the matrix K is equal to zero, i.e. (Z
0
, Y
0
) is
self-conjugate.
Apparently
_
Y Y
0
Z Z
0
_
is a fundamental system. Hence any solution (y, z) can be represented in the following
way: y = Y c
1
+ Y
0
c
2
and z = Zc
1
+ Z
0
c
2
. If y(a) = 0 then 0 = y(a) = Ic
1
= c
1
and z(a) = Ic
2
, thus y = Y
0
c
2
and z = Z
0
c
2
.
It turns out that Y
0
(t) is non-singular on (a, b], for suppose there is t
0
(a, b] such
that Y
0
(t
0
) is singular, then the linear equation Y
0
(t
0
)c = 0 has a non-trivial solution
c
0
. Let y
0
(t) := Y
0
(t)c
0
on [a, b] then y
0
(a) = 0 and y
0
(t
0
) = 0. Moreover for
z
0
(t) := Z
0
(t)c
0
we have z
0
(a) = Ic
0
= c
0
,= 0, hence t
0
is a conjugate point of a, a
contradiction.
We now use a construction that can be found in Hestenes: there is an > 0 such
that x
can be extended to a function x
on [a , b] and L
x x
( x
,

x
, ) remains
positive denite on [a , b]. If we insert x
into L
x x
and L
xx
, the matrices A, B, C
(using the notation of Remark 9.3.5) retain their properties and we can consider the
corresponding Jacobi equation extended to [a , b]. Then Lemma 5.1 in [38] on
p. 129 yields an a
0
< a such that x
has no conjugate point w.r.t. a

0
on (a
0
, b]. But
then, using the same initial conditions at a
0
and the same argument as in the rst part
of the proof, we obtain a self-conjugate solution (Z
1
, Y
1
) on [a
0
, b] such that Y
1
is
non-singular on (a
0
, b]. The restriction of Y
1
is, of course, a solution of the Jacobi
equation on [a, b] that is non-singular there.
Using Theorem 9.3.8 it follows that the Z
1
Y
1
1
is a symmetrical solution of the
LegendreRiccati equation.
Section 9.4 Strong Convexity and Strong Local Minima 325
Theorem 9.3.11 (Fundamental Theorem of JacobiWeierstrass). If the regular ex-
tremal x
does not have a conjugate point on (a, b] then x
is a weak local minimal

solution of the given variational problem.
9.3.1 Carathodory Minimale
RCS
1
[a, b]
n
is called a Carathodory Minimale,
if for every t
0
(a, b) there are s, t (a, b) with s < t
0
< t such that x
[
[s,t]
is a
weak local solution of the variational problem
min
_
t
s
L(x(), x(), )d
on S
s,t
:= x RCS
1
[s, t]
n
[ (x(), x(), ) U, [s, t],
x(s) = x
(s), x(t) = x
(t).
As the consequence of the Theorem 9.3.4 we obtain (compare [17], p. 210) the
Theorem 9.3.13. Every regular extremal is a Carathodory Minimale.
Proof. The matrix W := r(tt
0
)I satises, for large r on the interval [t
0
1/r, t
0
+
1/r], the LegendreRiccati condition: in fact, for p R
n
with |p| = 1 we obtain
p
T
( P(t) +rI (Q
T
(t) +r(t t
0
)I) R
1
(t)(Q(t) + r(t t
0
)I))p
r (|Q
T
| + 1) |R
1
| (|Q(t)| + 1) |P(t)| > 0
for r large enough.
9.4 Strong Convexity and Strong Local Minima
Denition 9.4.1. Let : R
0
R
0
be non-decreasing, (0) = 0 and (s) > 0 for
s > 0. Then we call a module function.
The following two lemmata can be found in [59]:
Lemma 9.4.2. Let X be a normed space, U X open, and f : U Rdifferentiable
and K a convex subset of U. Then the following statements are equivalent:
(a) f : K R is uniformly convex, i.e. f is convex and for a module function and
all x, y K
f
_
x +y
2
_
1
2
f(x) +
1
2
f(y) (|x y|),
(b) there is a module function
1
such that for all x, y K
f(y) f(x) y x, f
t
(x) +
1
(|x y|),
(c) there is a module function
2
such that for all x, y K
y x, f
t
(y) f
t
(x)
2
(|x y|).
Proof. (a) (b):
1
2
(f(x) +f(y)) f
_
x +y
2
_
+(|x y|) f(x) +f(x)
_
x +y
2
x, f
t
(x)
_
+f(x) +(|x y|).
Multiplication of both sides by 2 yields
f(y) f(x) y x, f
t
(x) + 2(|x y|).
(b) (a): Let [0, 1] and let z := x + (1 )y then
x z, f
t
(z) f(x) f(z)
1
(|z x|)
y z, f
t
(z) f(y) f(z)
1
(|z y|).
Multiplication of the rst inequality by and the second by 1 and subsequent
addition yields
0 = 0, f
t
(z) = x + (1 )y z, f
t
(z)
f(x) + (1 )f(y)) f(z)
1
(|z x|) (1 )
1
(|z y|),
showing that f is convex and we obtain for =
1
2
f
_
x +y
2
_
1
2
f(x) +
1
2
f(y)
1
_
|x y|
2
_
.
(b) (c): we have
f(y) f(x) y x, f
t
(x) +
1
(|x y|),
f(x) f(y) x y, f
t
(y) +
1
(|x y|).
Addition of both inequalities yields
x y, f
t
(x) f(y) 2
1
(|x y|).
(c) (b): Let x, y K. First we show that f is convex. For h := y x we obtain
by the mean value theorem an (0, 1) such that
f(y) f(x) = h, f
t
(x +h).
Using (c) we have
h, f
t
(x +h) f
t
(x) 0,
hence
h, f
t
(x) h, f
t
(x +h) = f(y) f(x),
showing that f is convex by a similar argument as above.
Moreover by (c)
_
x +y
2
x, f
t
_
x +y
2
_
f
t
(x)
_

2
_
|x y|
2
_
,
i.e.
1
2
_
y x, f
t
_
x +y
2
__
1
2
y x, f
t
(x) +
2
_
|x y|
2
_
.
Therefore
f(y) f(x) = f(y) f
_
x +y
2
_
+f
_
x +y
2
_
f(x)
_
y
x +y
2
, f
t
_
x +y
2
__
+
_
x +y
2
x, f
t
(x)
_
1
2
y x, f
t
(x) +
2
_
|x y|
2
_
+
1
2
y x, f
t
(x)
= y x, f
t
(x) +
2
_
|x y|
2
_
.
Lemma 9.4.3. Let X be a normed space, U X open, and g : U R twice
continuously differentiable. If for an x
U and c > 0 we have

g
tt
(x
)h, h c|h|
2
for all h X, there exists a > 0 such that g is strongly convex on K := K(x
, ).
In particular,
g
_
x +y
2
_
1
2
g(x) +
1
2
g(y)
c
8
|x y|
2
for all x, y K.
Proof. As g
tt
is continuous, there is a > 0 such that for all x K(x
, ): |g
tt
(x)
g
tt
(x
)|
c
2
Hence for all x K and all h X we obtain
g
tt
(x)h, h = (g
tt
(x) g
tt
(x
))h, h +g
tt
(x
)h, h
|g
tt
(x) g
tt
(x
)||h|
2
+c|h|
2
c
2
|h|
2
.
Let x, y K then the mean value theorem yields
g(y) g(x) y x, g
t
(x) = y x, g
tt
(x +(y x))(y x)
c
2
|y x|
2
.
For
1
(s) =
c
2
s
2
in (b) of the previous lemma the assertion follows.
Theorem 9.4.4 (Uniform Strong Convexity of the Lagrangian). Let x
be an extremal
and let the LegendreRiccati condition be satised, then there is a > 0 and a c > 0
such that for all (p, q), (u, v) K
t
:= K((x
(t), x
(t)), ) and for all t [a, b] we

have
L
_
p +u
2
,
q +v
2
, t
_
1
2
L(p, q, t) +
1
2
L(u, v, t)
c
8
(|p u|
2
+|q v|
2
).
Proof. Let
t
be as in Theorem 9.3.4. As the set K
1
:= (p, q) R
2n
[ |p|
2
+|q|
2
=
1 is compact and as t
tt
t
(x
(t), x
(t)) is continuous on [a, b] there is a positive

c R such that for all t [a, b]
_
p
q
_
T
tt
t
(x
(t), x
(t))
_
p
q
_
c (9.8)
on K
1
, i.e. t
tt
t
(x
(t), x
(t)) is uniformly positive denite on [a, b].

According to Lemma 9.3.1 there is a > 0 such that on the compact set in R
2n+1
_
t[a,b]
K((x
(t), x
(t), t), ) U
we have uniform continuity of (p, q, t)
tt
t
(p, q). Hence there is a > 0 such that
for all t [a, b] and all (u, v) K
t
:= K((x
(t), x
(t)), ) we have
|
tt
t
(u, v)
tt
t
(x
(t), x
(t))|
c
2
,
and hence on that set
_
p
q
_
T
tt
t
(u, v)
_
p
q
_
c
2
.
Thus we obtain that (p, q)

L(p, q, t) is uniformly strongly convex on K
t
for all
t [a, b], i.e.
L
_
p +u
2
,
q +v
2
, t
_
1
2
L(p, q, t) +
1
2
L(u, v, t)
c
8
(|p u|
2
+|q v|
2
)
for all (p, q), (u, v) K
t
and all t [a, b].
For the corresponding variational functional the above theorem leads to the
Corollary 9.4.5 (Strong Convexity of the Variational Functional). Let
B(x
, ) := x RCS
1
[a, b]
n
[ |x(t) x
(t)|
2
+| x(t) x
(t)|
2
< t [a, b],
then the variational functional

f belonging to the Lagrangian

L is uniformly strongly
convex with respect to the Sobolev norm
f
_
x +y
2
_
1
2
f(x) +
1
2
f(y)
c
8
|x y|
2
W
.
Let V := x B(x
, ) [ x(a) = , x(b) = be the subset of those functions sat-

isfying the boundary conditions then on V the original functional f is also uniformly
strongly convex w.r.t. the Sobolev norm
f
_
x +y
2
_
1
2
f(x) +
1
2
f(y)
c
8
|x y|
2
W
.
Furthermore, every minimizing sequence converges to the minimal solution w.r.t. the
Sobolev norm (strong solvability).
Proof. For the variational functional

f(x) :=
_
b
a
L(x(t), x(t), t)dt we then obtain
f
_
x +y
2
_
=
_
b
a
L
_
x(t) +y(t)
2
,
x(t) + y(t)
2
, t
_
dt
_
b
a
_
1
2
L(x(t), x(t), t) +
1
2
L(y(t), y(t), t)
_
dt
c
8
_
b
a
(|x(t) y(t)|
2
+| x(t) y(t)|
2
)dt
=
1
2
f(x) +
1
2
f(y)
c
8
|x y|
2
W
for all x, y B(x
, ). Thus

f is strictly convex on B(x
, ). On V the functional
f(x) :=
_
b
a
L(x(t), x(t), t)dt differs from the functional

f only by a constant, hence
has the same minimal solutions. On V the inequality
f
_
x +y
2
_
1
2
f(x) +
1
2
f(y)
c
8
|x y|
2
W
is satised, i.e. f is strongly convex on V w.r.t. the Sobolev norm.
Using Theorem 9.3.10 we obtain the following corollary (retaining the notation of
the previous corollary):
Corollary 9.4.6 (Strong Convexity). If the regular extremal x
does not have a con-

jugate point then there is a > 0 such that on B(x
, ) the modied variational

functional

f is uniformly strongly convex with respect to the Sobolev norm: and on V
the original functional f is also uniformly strongly convex w.r.t. the Sobolev norm.
9.4.1 Strong Local Minima
We now investigate the question under what conditions we can guarantee that weak
local minimal solutions are in fact strong local minimal solutions. It turns out that such
a strong property can be presented and proved without the use of embedding theory.
We will show that strong local minima require a supplement for the Lagrangian that
is generated by the supplement potential
(t, p) F(t, p) :=
1
2
p, W(t)p (t), p,
where W(t) is symmetrical and the linear term is chosen in such a way that along
the extremal the necessary conditions for optimality for

L coincide with the fulll-
ment of the EulerLagrange equation and the quadratic term, with its convexication
property, provides the sufcient optimality conditions.
The supplemented Lagrangian then has the following structure:
L(p, q, t) = L(p, q, t) +F
t
+F
p
, q
= L(p, q, t) +
1
2
p,

W(t)p
(t), p +W(t)p, q (t), q.

If we dene
(t) := L
q
(x
(t), x
(t), t) +W(t)x
(t),
then using the EulerLagrange equation for the original Lagrangian L, i.e.
d
dt
L
q
=
L
p
, we obtain in fact the necessary conditions
L
p
(x
(t), x
(t), t) =

L
q
(x
(t), x
(t), t) = 0.
In the subsequent theorem we make use of the following lemma.
Lemma 9.4.7. Let r > 0 and the ball K(x
0
, r) R
n
. Let U K(x
0
, r) open. Let
g : U Rbe continuous, and let g be for all x S(x
0
, r) directionally differentiable
with the property g
t
(x, x x
0
) > 0. Then every minimal solution of g on K(x
0
, r)
is in the interior of that ball. In particular, if g is differentiable then g
t
(x
) = 0 for
every minimal solution x
in K(x
0
, r).
Proof. Since K(x
0
, r) is compact, g has a minimal solution x
there. Suppose |x
x
0
| = r then the function : [0, 1] R with
t (t) := g(x
+t(x
0
x
))
has a minimal solution at 0, hence
t
(0) 0. Since
t
(0) = lim
t0
g(x
+t(x
0
x
)) g(x
)
t
= g
t
(x
, x
0
x
) = g
t
(x
, x
0
x
) < 0,
a contradiction.
Theorem 9.4.8 (Strong Local Minimum). Let x
be an admissible, regular extremal

and let the LegendreRiccati condition be satised.
Let there exist a > 0 such that for all t [a, b] and all p with |p x
(t)| <
the set V
t,p
:= q R
n
[ (p, q, t) U is convex and the function L(p, , t) is convex
on V
t,p
. Then x
is a locally strong minimal solution of the variational problem, i.e.

there is a positive d, such that for all x K := x S [ |x x
< d we have
_
b
a
L(x
(t), x
(t), t)dt
_
b
a
L(x(t), x(t), t)dt.
Proof. In Theorem 9.4.4 we have constructed positive constants c and such that for
all (p, v), (u, v) K
t
:= K(x
(t), x
(t), ) we have
L
_
p +u
2
,
q +v
2
, t
_
1
2
L(p, q, t) +
1
2
L(u, v, t)
c
8
(|p u|
2
+|q v|
2
).
But

L differs from

L only by the linear term
(t), p (t), q.
In particular

L
tt
=

L
tt
, i.e. the convexity properties remain unchanged, and hence
L(, , t) is strongly convex on K

t
:= K((x
(t), x
(t)), ) with a uniform constant c

for all t [a, b].
From Lemma 9.4.2 we obtain for this c that
L
q
(x
(t), q, t)

L
q
(x
(t), x
(t), t), q x
(t) =
L
q
(x
(t), q, t), q x
(t)
c
2
|q x
(t)|
2
(strong monotonicity of

L
q
(x
(t), , t)).
The uniform continuity of

L
q
guarantees that for 0 < < c /4 there is a 0 < d
min/2, such that for all t [a, b]
|
L
q
(x
(t), q, t)

L
q
(p, q, t)| <
for all p x
(t) +K(0, d). We obtain
L
q
(x
(t), q, t)

L
q
(p, q, t), q x
(t)
|
L
q
(x
(t), q, t)

L
q
(p, q, t)||q x
(t)| < |q x
(t)|.
Let = /2, then on the sphere q [ |q x
(t)| = it follows that for all p

x
(t) +K(0, d)
L
q
(p, q, t), q x
(t) c|q x
(t)|
2
|q x
(t)| =
c
2
2
> 0.
Hence from Lemma 9.4.7 we obtain for every p x
(t) +K(0, d) a minimal solution

of q
(p) K( x
(t), ) such that
L
q
(p, q
(p), t) = 0.
From the convexity of

L(p, , t) we conclude that q
(p) is a minimal solution of
L(p, , t) on V
t,p
. (Note that

L(p, , t) differs from L(p, , t) only by an afne term
in q.)
We shall now show that for all t [a, b] we have that (x
(t), x
(t)) is a minimal
solution of

L(, , t) on W
t
:= (p, q) R
n
R
n
[ (p, q, t) U and |px
(t)| < d.
For, suppose there exists (p, q) W
t
such that
L(x
(t), x
(t), t) >

L(p, q, t).
As x
is extremal , and as

L(, , t) is convex on K
t
we have (x
(t), x
(t)) is minimal
solution of

L(, , t) on K
t
by construction of

L. As (p, q
(p)) K
t
we obtain
L(x
(t), x
(t), t)

L(p, q
(p), t)

L(p, q, t) <

L(x
(t), x
(t), t),
a contradiction.
For all x K = x S [ |x
x|
< d we then have

_
b
a
L(x
(t), x
(t), t)dt
_
b
a
L(x(t), x(t), t)dt

and, as the integrals differ on S only by a constant, we obtain the corresponding
inequality also for the original Lagrangian
_
b
a
L(x
(t), x
(t), t)dt
_
b
a
L(x(t), x(t), t)dt,
which completes the proof.
Section 9.5 Necessary Conditions 333
The previous theorem together with Theorem 9.3.10 leads to the following
Corollary 9.4.9 (Strong Local Minimum). Let x
be an admissible and regular ex-

tremal without conjugate points.
Let there exist a > 0 such that for all t [a, b] and all p with |p x
(t)| <
the set V
t,p
:= q R
n
[ (p, q, t) U is convex and the function L(p, , t) is convex
on V
t,p
. Then x
is a locally strong minimal solution of the variational problem, i.e.

there is a positive d, such that for all x K := x S [ |x x
< d we have
_
b
a
L(x
(t), x
(t), t)dt
_
b
a
L(x(t), x(t), t)dt.
Remark 9.4.10. If in particular U = U
1
U
2
[a, b], where U
1
R
n
open, U
2
R
n
open and convex, and L(p, , t) : U
2
R convex for all (p, t) U
1
[a, b] then the
requirements of the previous theorem are satised.
9.5 Necessary Conditions
We briey restate the EulerLagrange equation as a necessary condition in the piece-
wise continuous case. The standard proof carries over to this situation (see Hestenes
[38], Lemma 5.1, 70).
Theorem 9.5.1. Let L, L
x
, L
x
be piecewise continuous. Let x
be a solution of the
variational problem such that its graph is contained in the interior of U. Then x
is
an extremaloid, i.e. there is a c R
n
such that
L
x
(x
(t), x
(t), t) =
_
t
a
L
x
((x
(), x
(), )d +c.
9.5.1 The Jacobi Equation as a Necessary Condition
A different approach for obtaining the Jacobi equation is to consider the variational
problem that corresponds to the second directional derivative of the original varia-
tional problem:
Let x
be a solution of the original variational problem, let

V := h RCS
1
[a, b]
n
[ h(a) = h(b) = 0,
and let
() := f(x
+h) =
_
b
a
L(x
(t) +h(t), x
(t) +
h(t), t)dt.
Then the necessary condition yields
0
tt
(0) = f
tt
(x
, h) =
_
b
a
L
0
x x
h,

h + 2L
0
xx
h,

h +L
0
xx
h, hdt
=
_
b
a
R
h,

h + 2Qh,

h +Ph, hdt,
using our notation in Remark 9.3.5. Then the (quadratic) variational problem
Minimize f
tt
(x
, ) on V
is called the accessory (secondary) variational problem. It turns out that the corre-
sponding EulerLagrange equation
d
dt
(R
h +Qh) = Q
T

h +Ph
or in matrix form
d
dt
(R

Y +QY ) = Q
T
Y +PY
yields the Jacobi equation in canonical form
Z = CY A
T
Z
Y = AY +BZ
by setting Z := R

Y +QY and using the notation in Remark 9.3.5.
Lemma 9.5.2. If h
V is a solution of the Jacobi equation in the piecewise sense

then it is a minimal solution of the accessory problem.
Proof. Let
(h,

h) := R
h,

h + 2Qh,

h +Ph, h
h
is extremal of the accessory problem, i.e. (in the piecewise sense)

d
dt
(2R
+ 2Qh
) = 2Ph
+ 2Q
T

h
.
Hence
(h
,

h
) = R
+Qh
,

h
+
_
d
dt
(R
+Qh
), h
_
=
d
dt
(R
+Qh
, h
).
Using the WeierstrassErdmann condition for the accessory problem, i.e. R
+Qh
continuous, we can apply the main theorem of differential and integral calculus
_
b
a
(h
,

h
)dt = R
+Qh
, h
[
b
a
= 0,
as h
(a) = h
(b) = 0. Hence h
is minimal solution as f
tt
(x
, ) is non-negative.
Section 9.6 C
1
-variational Problems 335
Theorem 9.5.3 (Jacobis Necessary Condition for Optimality). If a regular extremal
x
is a minimal solution of the variational problem on S, then a has no conjugate

point in (a, b).
Proof. For otherwise, let (h
, k
) be a non-trivial solution of the Jacobi equation with

h(a) = 0, and let c (a, b) be such a conjugate point. Then h
(c) = 0 and k
(c) ,=
0. Because k
(c) = R
(c) + Qh
(c) = R
(c) it follows that

h
(c) ,= 0. We
dene y(t) = h
(t) for t [a, c], and y(t) = 0 for t [c, b]. In particular y(c) ,=
0. Obviously, y is a solution of the Jacobi equation (in the piecewise sense) and
hence, according to the previous lemma, a minimal solution of the accessory problem,
a contradiction to Corollary 9.2.2 on smoothness of solutions, as is strictly convex
w.r.t.

h.
9.6 C
1
-variational Problems
Let
S
1
:= x C
1
[a, b]
n
[ (x(t), x(t), t) U, x(a) = , x(b) = ,
and
V
1
:= h C
1
[a, b]
n
[ h(a) = 0, h(b) = 0.
As the variational problem is now considered on a smaller set (S
1
S), sufcient
conditions carry over to this situation.
Again, just as in the RCS
1
-theory, the Jacobi condition is a necessary condition.
Theorem 9.6.1. If a regular extremal x
C
1
[a, b]
n
is a minimal solution of the
variational problem on S
1
, then a has no conjugate point in (a, b).
Proof. Let c (a, b) be a conjugate point of a. We consider the quadratic functional
h g(h) :=
_
b
a
(h,

h)dt
for h V
1
, where
(h,

h) := R
h,

h + 2Qh,

h +Ph, h.
We show that there is

h V with g(
h) < 0: for suppose g(h) 0 for all h V

then, as in Lemma 9.5.2 every extremal h
is a minimal solution of g, and we use

the construction for y as in the proof of Theorem 9.5.3 to obtain a solution of the
Jacobi equation in the piecewise sense (which is, again according to Lemma 9.5.2
a minimal solution of the accessory problem) but is not smooth, in contradiction to
Corollary 9.2.2, as is strictly convex w.r.t.

h. Hence there is

h V with g(
h) < 0.
Now we use the process of smoothing of corners as described in Carathodory [17].
Thus we obtain a

h C
1
[a, b]
n
with g(
h) < 0. Let () = f(x
h) then
g(
h) =
tt
(0) 0 as x
is minimal solution of f on S
1
, a contradiction.
9.7 Optimal Paths
For a comprehensive view of our results we introduce the notion of an optimal path.
Our main objective in this context is, to characterize necessary and sufcient condi-
tions.
RCS
1
[a, b]
n
is called an optimal path, if it is a
weak local solution of the variational problem
min
_
t
a
L(x(), x(), )d
on
S
t
:= x RCS
1
[a, t]
n
[ (x(), x(), ) U, [a, t], x(a) = , x(t) = x
(t)
for all t [a, b).
Theorem 9.7.2. Consider the variational problem
min
_
b
a
L(x(), x(), )d
on
S := x RCS
1
[a, t]
n
[ (x(), x(), ) U, [a, b], x(a) = , x(t) = .
Let x
RCS
1
[a, b]
n
be a regular extremal, then the following statements are
equivalent:
(a) x
is an optimal path.
(b) The variational problem has an equivalent convex problem in the following
sense: for the original Lagrangian there is a locally convexied Lagrangian

L
such that for every subinterval [a, ] [a, b) there is a > 0 with the property
that

L(, , t) is strictly convex on the ball K(x
(t), x
(t), ) for all t [a, ].

(c) The variational problem has an equivalent convex problem (in the sense of (b))
employing a quadratic supplement.
(d) For every subinterval [a, ] [a, b) there is a > 0 with the property that
(x
(t), x
(t)) is a pointwise minimum of

L(, , t) on the ball K(x
(t), x
(t), )
for all t [a, ].
(e) The LegendreRiccati condition is satised on [a, b).
(f) The Jacobi matrix equation has a non-singular and self-conjugate solution on
[a, b).
(g) a has no conjugate point in (a, b).
Section 9.8 Stability Considerations for Variational Problems 337
9.8 Stability Considerations for Variational Problems
For many classical variational problems (Brachistochrone, Dido problem, geometrical
optics) the optimal solutions exhibit singularities in the end points of the denition
interval. A framework to deal with problems of this kind is to consider these singular
problems as limits of well-behaved problems. This view requires aside from a
specication of these notions certainty about the question whether limits of the
solutions of the approximating problems are solutions of the original problem.
We will now apply our stability principles to variational problems and at rst only
vary the restriction set.
Continuous convergence of variational functionals requires special structures. If
e.g. the variational functionals are continuous and convex, then by Remark 5.3.16 the
continuous convergence already follows from pointwise convergence.
We will now show that from standard assumptions on the Lagrangian L the conti-
nuity of the variational functional on the corresponding restriction set follows, where
the norm on C
(1)
[a, b]
m
is dened as the sum of the maximum norm of y and y
|y|
C
1 := |y|
max
+| y|
max
.
Let in the sequel U R
2m+1
and let M =xC
(1)
[a, b]
m
[ (x(t), x(t), t) Ut
[a, b]. Then the following theorem holds (compare [35]):
Theorem 9.8.1. Let L : U R be continuous. Then the variational functional
x f(x) :=
_
b
a
L(x(t), x(t), t) dt
is continuous on M.
Proof. Let the sequence (x
n
) M and x M and let
lim
n
|x
n
x|
C
1 = 0, (9.9)
hence by denition (x
n
, x
n
) (x, x) uniformly on C[a, b]
m
. Uniform convergence
implies according to Theorem 5.3.6 continuous convergence, i.e. for t
n
t
0
it fol-
lows that (x
n
(t
n
), x
n
(t
n
), t
n
) (x(t
0
), x(t
0
), t
0
). From the continuity of L on U it
then follows that
L(x
n
(t
n
), x
n
(t
n
), t
n
) L(x(t
0
), x(t
0
), t
0
).
Let now
n
:= L(x
n
, x
n
, ) : [a, b] R and := L(x, x, ) : [a, b] R, then
n
and are continuous on [a, b] and we conclude: (
n
) converges continuously
to . Again by Theorem 5.3.6 uniform convergence follows. Therefore L(x
n
, x
n
, )
converges uniformly to L(x, x, ), whence f(x
n
) f(x).
We obtain the following stability theorems for variational problems:
Theorem 9.8.2. Let V R
2m
and U := V [a, b] be open. Let L : U R
be continuous. Let the sequence (
n
,
n
) R
2m
of boundary values converge to
(, ) R
2m
.
Let further
f(x) :=
_
b
a
L(x(t), x(t), t) dt,
S := x M [ x(a) = , x(b) = ,
S
n
:= x M [ x(a) =
n
, x(b) =
n
.
Then every point of accumulation of minimal solutions of the variational problems
minimize f on S
n
is a minimal solution of the variational problem minimize f on
S.
Proof. According to theorems (Stability Theorem 5.3.19 and Theorem 9.8.1) we only
have to verify
limS
n
= S.
Let for each n N
c
n
:=

n
+
n

b a
, d
n
:=
n
c
n
a,
then
t v
n
(t) := t c
n
+d
n
converges in C
(1)
[a, b]
n
(w.r.t. the norm ||
C
1
) to 0. Let x S.
By Lemma 9.3.1 (see also [35], p. 28) we have for sufciently large n N
x
n
:= x +v
n
M,
and hence by construction x
n
S
n
. Apparently (x
n
)
nN
converges to x. Therefore
lim
n
S
n
S.
On the other hand, uniform convergence also implies pointwise convergence at the
end points, whence lim
n
S
n
S follows. Since lim
n
S
n
lim
n
S
n
we obtain
lim
n
S
n
= S.
The assertion follows using the stability Theorem 5.3.19 and Theorem 9.8.1.
As an example we consider
9.8.1 Parametric Treatment of the Dido problem
We consider the following version of the Dido problem, which in the literature is also
referred to as the special isoperimetric problem (see Bolza [12], p. 465). We choose
Bolzas wording:
Dido problem. Among all ordinary curves of xed length, connecting two given
points P
1
and P
2
, the one is to be determined which together with the chord P
1
P
2

encloses the largest area.
This general formulation requires a more specic interpretation. In a similar man-
ner as Bolza we put the points P
1
, P
2
on the x-axis, symmetrical to zero and maximize
the integral (Leibnizs sector formula, s. Bohn, Goursat, Jordan and Heuser)
f(x, y) =
1
2
_
b
a
(x y y x)dt (9.10)
under the restriction
G(x, y) :=
_
b
a
_
x
2
+ y
2
dt = , x(a) = , x(b) = , y(a) = y(b) = 0,
where , R
>0
(2 < ) are given. Along the chord P
1
P
2
the integrand is equal
to 0 (y = y = 0), since [P
1
, P
2
] lies on the x-axis. Hence also for the closed curve,
generated through the completion of (x, y) by the interval [P
1
, P
2
] we have
f(x, y) =
_
b
a
(x y y x) dt +
_
P
2
P
1
(x y y x) dt =
_
b
a
(x y y x) dt.
Discussion of Leibnizs Formula. For arbitrary ordinary curves it is debatable,
what the meaning of Formula (9.10) is. For all (x, y) C
(1)
[a, b]
2
(resp. (x, y)
RCS
(1)
[a, b]
2
) the integral is dened (and for counterclockwise oriented boundary
curves of normal regions in fact the value is the enclosed area) but we avoid this
discussion and maximize f on the whole restriction set
S = (x, y) C
(1)
[a, b]
n
[ x(a) = P
1
, x(b) = P
2
, G(x, y) = .
As a rst step we show that Formula (9.10) is translation invariant for closed curves.
Lemma 9.8.3 (Translation invariance). Let piecewise continuously differentiable func-
tions (x, y) (RCS
(1)
[t
a
, t
b
])
2
be given with x(t
b
)x(t
a
) = 0 and y(t
b
)y(t
a
) = 0.
Let the translation vector (r
1
, r
2
) R
2
be given and (u, v) = (x + r
1
, y + r
2
) the
closed curve (x, y) shifted by (r
1
, r
2
). Then
_
t
b
t
a
(x y y x)dt =
_
t
b
t
a
(u v v u)dt.
Proof. Apparently u = x, v = y and hence
_
t
b
t
a
(u v +v u)dt =
_
t
b
t
a
(x y y x)dt +r
1
_
t
b
t
a
ydt r
2
_
t
b
t
a
xdt.
The curve is closed, i.e. x(t
a
) = x(t
b
), y(t
a
) = y(t
b
), whence
_
t
b
t
a
ydt = y(t
b
) y(t
a
) = 0 and
_
t
b
t
a
xdt = x(t
b
) x(t
a
) = 0,
and hence the assertion.
Leibnizs sector formula is meant for closed curves. Therefore we have extended
the curves under consideration (which will be traversed in counterclockwise direction)
by the interval (, ). The curve integral in Formula (9.10) is homogeneous w.r.t.
( x, y) and hence independent of the parameterization.
For our further treatment we choose the parameterization w.r.t. the arc length. Since
the total length of the closed curve is given by + 2, we have to consider piecewise
continuously differentiable functions (x, y) (RCS
(1)
[0, + 2])
2
with x
2
+ y
2
= 1
on the interval [0, + 2].
On the subinterval [, + 2] the desired curve runs along the x-axis and has the
parameterization y = 0 and t x(t) := t .
Apparently
_
+2
0
(x y y x)dt =
_

0
(x y y x)dt
holds, since the integral vanishes on [, + 2].
If the restriction set is shifted parallel to the x-axis by R, then due to Lem-
ma 9.8.3 for u := x and v := y +
_
+2
0
(u v v u)dt =
_
+2
0
(x y y x)dt.
On [, + 2] we have v = and hence
_
+2
(u v v u)dt =
_
+2
udt
= (u( + 2) u()) = ( +) = 2.
Therefore the integrals
_
+2
0
u v v udt
and
_

0
u v v udt
differ for all (u, v) only by the constant (independent of u and v) 2.
Therefore we obtain: if a minimal solution of f on the unshifted restriction set is
shifted by (0, ) then it is a minimal solution of f on the shifted restriction set S.
9.8.2 Dido problem
We will now deal with the Dido problem using a parametric supplement. We are
looking for a curve, which together with the shore interval [P
1
, P
2
] forms a closed
curve and encloses a maximal area. Let be the length of the curve.
We want to determine a smooth (resp. piecewise smooth) parametric curve
(x, y) : [0, ] R
2
with preassigned start (, 0) and end point (, 0), which maximizes
f(x, y) :=
_

0
(x y xy)dt, (9.11)
where the parameterization is done over the length of the curve, i.e. for all t [0, ]
we have
x
2
(t) + y
2
(t) = 1. (9.12)
Let 0 < <

2
. The restriction set is given by
S := (x, y) C
(1)
[0, ]
2
[ x
2
+ y
2
= 1, x(0) = , x() = , y(0) = y() = 0.
We now apply the approach of pointwise minimization (see Theorem 9.1.7) to the
supplemented functional f +, with the supplement
(x, y) =
_

0
(r( x
2
+ y
2
) x y

x y) dt, (9.13)
for , and r C
(1)
[0, ].
As soon as , and r are specied, the supplement is constant on S.
For the corresponding EulerLagrange equation we obtain
d
dt
(2r x y) = y (9.14)
d
dt
(2r y +x) = x. (9.15)
Together with Equation (9.12) we obtain for r =

2
and the boundary condition
= 0 the parameterized circle as an admissible extremal
x(t) = r sin
_
t
r
_
y(t) = r
_
1 cos
_
t
r
__
. (9.16)
But the function f + is not convex. We shall try to treat this variational problem
by use of successive minimization for a quadratic supplement. The minimization will
essentially be reduced to the determination of the deepest point of a parabola. For
a suitable supplement we choose a function C
(1)
(0, ] and a r R
0
, leading to
a quadratically supplemented functional

f
f(x) =
_

0
yx xy +r( x
2
+ y
2
) +(t)(x x +y y) +
1
2
(t)(x
2
+y
2
)dt (9.17)
(see Section 9.3). For the supplemented Lagrangian

L we then obtain
L(p, q, t) = p
1
q
2
p
2
q
1
+r(q
2
1
+q
2
2
) +(t)(p
1
q
1
+p
2
q
2
) +
1
2
(t)(p
2
1
+p
2
2
). (9.18)
Minimization w.r.t. q = (q
1
, q
2
) for xed p (successive minimization) of the convex
quadratic function
q r(q
2
1
+q
2
2
) + ((t)p
1
p
2
)q
1
+ (p
1
+(t)p
2
)q
2
, (9.19)
setting the partial derivatives w.r.t. q
1
and q
2
to zero, leads to the equations
(t)p
1
p
2
+ 2rq
1
= 0 i.e. q
1
=
p
2
(t)p
1
2r
. (9.20)
and
p
1
+ 2rq
2
+(t)p
2
= 0 i.e. q
2
=
p
1
+(t)p
2
2r
. (9.21)
Insertion into Equation (9.18) yields
(p) :=
((t)p
1
p
2
)
2
(p
1
+(t)p
2
)
2
4r
+
1
2
(t)(p
2
1
+p
2
2
)
=
(
2
(t) + 1)(p
2
1
+p
2
2
)
4r
+
1
2
(t)(p
2
1
+p
2
2
)
=
1
2
_
(t)
(
2
(t) + 1)
2r
_
(p
2
1
+p
2
2
).
If now is chosen such that the following ODE
=

2
+ 1
2r
(9.22)
is satised on all of (0, ), then is constant equal to zero and every p R
2
is a min-
imal solution of . The Differential Equation (9.22) (ODE with separated variables)
has a general solution of the form
(t) = tan
_
t C
2r
_
,
where C is a constant.
We can try to choose C R, r R
0
in a suitable way and put C :=

2
. However,
if we choose r :=

2
, then
(t) = tan
_
1
2
_
t
r

__
(9.23)
is a solution of Equation (9.22) on the open interval (0, ) but has a singularity at
t = , i.e. for the original problem the convexication fails (at least in this manner).
We now consider the situation for r >

2
. Using the formulae
s tan
_
s
2
_
=
sin s
1 + cos s
=
1 cos s
sin s
,
and x(t) = p
1
, y(t) = p
2
, x = q
1
and y = q
2
we obtain with Equations (9.20)
and (9.21) by direct insertion the following candidate for an optimal solution:
x
(t) = r sin
_
t C
r
_
and y
(t) = r
_
1 + cos
_
t C
r
__
.
If we succeed in choosing the constants C and r in such a way that (x
, y
) is inside
the restriction set and is dened on the whole interval [0, ], then we have found the
optimal solution.
We want to make the domain of denition for as large as possible, and in order to
achieve this we choose its center for C :=

2
. For the boundary condition w.r.t. x we
then obtain the equations
r sin
_

2r
_
= and r sin
_

2r
_
= .
For 0 < <

2
, there is always a unique r() >

2
, to satisfy these conditions, since
for r =

2
we have r sin
_
2r
_
= 0 and for r >

2
the mapping
r r sin
_

2r
_
=: (r)
is strictly increasing on [

2
, ). In order to compute the limit for r , we intro-
duce the new variable R :=
1
r
> 0. Using the rule of de LHospital we obtain
lim
R0
sin(
R
2
)
R
= lim
R0
2
cos(
R
2
)
1
=

2
.
Apparently lim
0
r() =

2
, due to the strict monotonicity and continuity of .
Let () := y
(0) = r()(1 + cos(

2
r()
)).
According to the principle of pointwise minimization (x
, y
) is a minimal solution
of the variational functional f on the restriction set, shifted by (0, ())
S = (x, y)C
(1)
[0, ]
2
[ x
2
+ y
2
= 1, x(0) = , x() = , y(0) = y() = ().
Since shifting of the restriction set leads to optimal solutions (see introductory remark
to Leibnizs formula) (x
, y
) (0, ()) is a solution of the problem for > 0.

Furthermore
()
0

2
(1 + cos()) = 0
holds. Altogether we observe that the solutions for > 0 converge for 0 to
the Solution (9.16) in C
1
[0, ]
2
: in order to see this, let (
n
) be a sequence tending to
zero with
n
(0, ) for all n N and let (x
n
, y
n
) be solutions for
n
, then using
the notation r
n
:= r(
n
) we have
x
n
(t) = r
n
sin
_
t /2
r
n
_
and y
n
(t) = r
n
_
1 + cos
_
t/2
r
n
__
(
n
).
[x
n
(t) x(t)[ =
_

2
r
n
_
sin
_
t
r
n
_

2
_
sin
t
r
n
sin
t
/2
_
+r
n
_
cos

2r
n
+ 1
_
sin
_
t
r
n
_
r
n
sin

2r
n
cos
t
r
n
_

2
r
n
_
+

2
sin
t
r
n
sin
t
/2
+r
n
cos

2r
n
+ 1
+r
n
sin

2r
n
.
Since [sin
t
r
n
sin
t
/2
[ [
1
r
n
1
/2
[ it follows that
max
t[0,]
[x
n
(t) x(t)[
n
0.
Correspondingly we obtain
max
t[0,]
[ x
n
(t) x(t)[
1
r
n
1
/2
cos

2r
n
+ 1
sin

2r
n
0,
and in a similar manner
[y
n
(t) y(t)[ 2
2
r
n
+r
n
sin

2r
n
+

2
2
1
r
n
1
/2
r
n
cos

2r
n
+ 1
+(
n
)
0,
as well as
max
t[0,]
[ y
n
(t) y(t)[
cos

2r
n
+ 1
sin

2r
n
1
r
n
1
/2
0.
According to stability Theorem 9.8.2 it follows that (9.16) is a solution of the Dido
problem.
Remark. The above stated solution (x, y) of the Dido problem is, because of the neg-
ative orientation of the curve a minimal solution (with negative area). The boundary
conditions of the Dido problem do not determine a unique solution (contrary to the
approximating problems for > 0, the solutions of which also have a negative orien-
tation), since xing a point on the circle does not determine its position. Interchanging
e.g. x and y also leads to a solution of the Dido problem (rotated circle by 90), since
the area is the same.
If one requires in addition y 0, then the opposite circle with parametric represen-
tation x(t) = r sin(
t
r
) and y(t) = r(1cos(
t
r
)) is a (maximal)-solution, if we choose
the supplement
(x, y) =
_

0
r( x
2
+ y
2
) x y

x y dt. (9.24)
For the corresponding EulerLagrange equations we then obtain
d
dt
(2r x +y) = y (9.25)
d
dt
(2r y x) = x. (9.26)
9.8.3 Global Optimal Paths
For many variational problems the end points of the curve to be determined play a
particular role. Here we often encounter singularities. Let for V R
2m
open and
, R
m
the restriction set be dened as
S := x C
(1)
[a, b]
n
[ x(a) = , x(b) = , (x(t), x(t)) V for all t [a, b].
On each subinterval of the interval (a, b) the curves we are looking for are mostly
well-behaved.
In order to avoid difculties (singularities) in the end points, we will now introduce
the following notion of optimality.
Denition 9.8.4. A x
S is called a global optimal path, if for each [0,

ba
2
)
x
is a solution of the following variational problems

Minimize
_
b
a+
L(x(t), x(t), t)dt
on the restriction set S
, dened by
S
:= x C
(1)
[a, b]
n
[ x(a +) = x
(a +), x(b ) = x
(b ),
(x(t), x(t)) V t [a +, b ].
Then the following stability theorem holds:
Theorem 9.8.5. Let L : V [a, b] be continuous. Then every global optimal path is
a minimal solution of the variational problem
Minimize f(x) =
_
b
a
L(x(t), x(t), t)dt on S.
Proof. Let (
n
)
nN
be a sequence tending to zero in (0,
ba
2
) and x S.
Let a
n
= x
(a +
n
), b
n
= x
(b
n
), u
n
(t) = (t a)
b
n
v
n
(b)
ba
and v
n
(t) =
(t b +
n
)
a
n
a
ab+
n
.
Since x(a) = x
(a) and x(b) = x
(b) the function sequences

(u
n
)
nN
, (v
n
)
nN
, ( u
n
)
nN
and ( v
n
)
nN
converge uniformly to 0.
For an n
0
N we have by Lemma 9.3.1
x
n
:= x +u
n
+v
n
S
n
.
This means limS
n
S. By Theorem 5.3.20 the assertion follows.
9.8.4 General Stability Theorems
So far we have in our stability considerations only varied the restriction set. We will
now consider varying the variational functional. We obtain the rst general stability
assertion in this context for monotone convergence (see Theorem 5.2.3).
Theorem 9.8.6. Let V R
m
R
m
, L : V [a, b] R continuous and for all n
N L
n
: V [a, b] R continuous, where (L
n
)
nN
converges pointwise monotonically
to L. Let
W := x C
(1)
[a, b]
m
[ (x(t), x(t)) V t [a, b],
f : W R, x f(x) :=
_
b
a
L(x(t), x(t), t) dt,
and
f
n
: W R, x f
n
(x) :=
_
b
a
L
n
(x(t), x(t), t) dt
be dened. For a set S W and the sequence (S
n
W)
nN
let lim
n
S
n
S be
satised.
Then each point of accumulation of the sequence of minimal solutions of the vari-
ational problems minimize f
n
on S
n
, which is in S, is a minimal solution of the
variational problem minimize f on S.
Proof. Let x W. The sequence of continuous functions L
n
(x(), x(), )
nN
con-
verges pointwise monotonically on [a, b] to the continuous function L(x(), x(), ).
The corresponding functionals are by Theorem 9.8.1 also continuous. According to
the theorem of Dini for real-valued functions on [a, b] the convergence is uniform.
Therefore f
n
(x) f(x). The convergence of the values of the functionals is also
monotone. By the theorem of Dini in metric spaces (see Theorem 5.2.2) this conver-
gence is continuous. The Stability Theorem 5.3.20 then implies the assertion.
We also have the
2m+1
, and let the sequence of continuous functions (L
n
:
U R)
nN
converge continuously to the continuous function L : U R. Let
R := x C
(1)
[a, b]
m
[ (x(t), x(t), t) U for all t [a, b].
Then the sequence of functionals
f
n
(x) =
_
b
a
L
n
(x(t), x(t), t) dt
converges continuously to
f(x) =
_
b
a
L(x(t), x(t), t) dt
on R.
Let S R and (S
n
)
nN
be a sequence of subsets of the restriction set of R with
lim
n
S
n
S. (9.27)
Let for each n N x
n
n
on S
n
.
Then each point of accumulation of the sequence (x
n
)
nN
, lying in S, is a minimal
solution of f on S.
Proof. Let x
n
x be convergent w.r.t. ||
C
1
.
We show that the sequence
n
:= L
n
(x
n
(), x
n
(), ) : [a, b] R converges
uniformly to := L(x(), x(), ) : [a, b] R. As a composition of continuous
functions is continuous. It sufces to verify continuous convergence of (
n
)
nN
,
since this implies uniform convergence.
Let t
n
[a, b] be a to t [a, b] convergent sequence, then
(x
n
(t
n
), x
n
(t
n
), t
n
) (x(t), x(t), t).
Since the functions L
n
: U R converge continuously to L : U R, we have
L
n
(x
n
(t
n
), x
n
(t
n
), t
n
)
n
L(x(t), x(t), t).
Uniform convergence of (
n
)
nN
to implies continuous convergence of the se-
quence of functionals (f
n
)
nN
to f.
The remaining part of the assertion follows from Stability Theorem 5.3.20.
Remark 9.8.8. Let R be open and convex. If for every t [a, b] the subsets U
t
:=
(p, q) R
2n
[ (p, q, t) U are convex, L : U
t
R is convex and the sequence of
functions (L
n
(, , t) : U
t
R)
nN
is convex, then continuous convergence already
follows from pointwise convergence of the functionals
f
n
(x) :=
_
b
a
L
n
(x(t), x(t), t) dt
to
f(x) :=
_
b
a
L(x(t), x(t), t) dt
(see Remark 5.3.16).
More generally:
Remark 9.8.9. A pointwise convergent sequence (L
n
: U R)
nN
converges con-
tinuously to L : U R, if U is open and convex and for every n N the Lagrangian
L
n
is componentwise convex. The convexity in the respective component can be
replaced by requiring concavity (see section on convex operators in the chapter on
stability).
By Theorem 9.8.7 and Remark 9.8.9 we obtain
2n+1
open and convex. Let the sequence of continuous
functions (L
n
: U R)
nN
be componentwise convex and pointwise convergent to
L : U R.
Let
R := x C
(1)
[a, b]
m
[ (x(t), x(t), t) U for all t [a, b].
The sequence of functionals
f
n
(x) =
_
b
a
L
n
(x(t), x(t), t) dt
converges continuously to
f(x) =
_
b
a
L(x(t), x(t), t) dt
on R.
Let S R and (S
n
)
nN
be a sequence of subsets of the restriction set R with
lim
n
S
n
S.
Let for every n N x
n
n
on S
n
.
Then each point of accumulation of the sequence (x
n
)
nN
, which is in S, is a
minimal solution of f on S.
As an example for the above stability theorem we consider
9.8.5 Dido problem with Two-dimensional Quadratic Supplement
We treat the problem: minimize the functional (which represents the 2-fold of the
negative Leibniz formula)
f(x) =
_
2
0
x
1
x
2
+ x
1
x
2
dt (9.28)
on
S = x = (x
1
, x
2
) C
(1)
[0, 2]
2
[ | x| = 1, x
1
(0) = x
1
(2) = 1,
x
2
(0) = x
2
(2) = 0.
As supplement we choose a sum of a parametric and a (two-dimensional) linear sup-
plement, i.e. for a positive C
(1)
[0, 2] and C
(1)
[0, 2]
2
let
(x) =
_
2
0
_
1
2
| x|
2
T
x x
T

_
dt, (9.29)
which is equal to the constant 2
_
2
0
(t)dt
1
(2) +
1
(0) on S.
According to the approach of pointwise minimization we have to minimize the
function below for xed t [0, 2]
(p, q)
t
(p, q) := p
1
q
2
+p
2
q
1
+
1
2
(t)(q
2
1
+q
2
2
)
T
(t)q p
T

(t) (9.30)
on R
4
. A necessary condition is that the partial derivatives vanish. This results in the
equations
q
2

1
(t) = 0; q
1

2
(t) = 0 (9.31)
p
2
+(t)q
1
1
(t) = 0; p
1
+(t)q
2
2
(t) = 0. (9.32)
Just as Queen Dido, we expect the unit circle as the solution with the parameterization
x
1
(t) = cos(t) and x
2
(t) = sin(t)
and we obtain the following equations for specication of and
1
(t) = cos(t),

2
(t) = sin(t) (9.33)
sin(t) (t) sin(t) =
1
(t), cos(t) +(t) cos(t) =
2
(t). (9.34)
For 2 and (t) = (sin(t), cos(t)) these equations are satised. But the function
f + is not convex and the Conditions (9.31) and (9.32) do not constitute sufcient
optimality conditions.
In order to obtain those, we will now employ the approach of convexication by a
quadratic supplement.
For C =
_
0 1
1 0
_
and a R
>0
we will now (for xed t [0, 2]) try to convexify
the function
t
(p, q) = p
T
Cq +
1
2
|q|
2
using a quadratic supplement.
In other words, we attempt to nd a continuously differentiable matrix function
Q : [0, 2] R
2
R
2
such that for all t [0, 2] the function
R
4
(p, q) (p, q) = p
1
q
2
+q
1
p
2
+
1
2
|q|
2
+q
T
Qp +
1
2
p
T
Qp,
is convex, where p = (p
1
, p
2
) and q = (q
1
, q
2
).
The quadratic supplement function E : R
2
R
2
[0, 2] R we choose to be of
the form (compare Section 9.3)
(p, q, t) E(p, q, t) = q
T
Q(t)p +
1
2
p
T
Q(t)p,
where Q = I with a C
(1)
[0, 2] and I is the 2 2-unit matrix.
Then

Q = I and for all t [0, 2]
t
(p, q) +E(p, q, t) = p
T
Cq +
1
2
q
T
q +q
T
Q(t)p +
1
2
p
T
Q(t)p.
This function is convex on R
2
R
2
, if and only if for all (p, q) R
4
the Hessian
matrix
H
t
(p, q) =
_

Q(t) C +Q
C
T
+Q I
_
is positive semi-denite.
According to Lemma 9.3.2 this is the case, if and only if
M
t
:=

Q(t)
1
(C +Q(t))
T
(C +Q(t))
is positive semi-denite. We have
M
t
=
_
(t) 0
0 (t)
_
_
(t) 1
1 (t)
__
(t) 1
1 (t)
_
=
_
(t) 0
0 (t)
_
_
1 +
2
(t) 0
0 1 +
2
(t)
_
=
_
(t)
1
(1 +
2
(t)) 0
0 (t)
1
(1 +
2
(t))
_
.
This matrix is positive semi-denite, if
(t)
1
(1 +
2
(t)) 0
holds.
The differential equation of Riccati-type
=
1
(1 +
2
)
has for > 2 on all of [0, 2] the solution
(t) = tan(
t
).
But for = 2 we observe that
2
(t) = tan(
t
2
) is only dened on the open interval
(0, 2).
For a complete solution of the isoperimetric problemwe need the following stability
consideration:
Let (
n
)
nN
be a sequence in (2, ), converging to 2. Based on the above consid-
erations we will establish that for every n N the variational problem
Minimize f
n
(x) =
_
2
0
(x
1
x
2
+x
2
x
1
+
1
2
n
( x
2
1
+ x
2
2
))dt (P
n
)
on
S
n
=
_
xC[0, 2] C
(1)
(0, 2)
| x| = 1, x(0) = (1, 0),

x(2) =
_
cos
_
4
n
_
, sin
_
4
n
___
has a minimal solution
t x
n
(t) =
_
cos
_
2t
n
_
, sin
_
2t
n
__
.
At rst we observe that x
n
is an extremal of (P
n
), since for the EulerLagrange
equations we have
n
x
1
+ x
2
= x
2
(9.35)
n
x
2
x
1
= x
1
. (9.36)
The extremal is a solution, because the problem (P
n
) has an equivalent convex varia-
tional problem w.r.t. the supplement potential
F
n
(t, p) = p
T
Q
n
p,
where
Q
n
(t) =
_
n
(t) 0
0
n
(t)
_
,
and
n
(t) = tan(
t
n
).
Moreover, for each x S the function
t y
n
(t) = x(t) +
t
2
_
1 + cos
_
4
n
_
, sin
_
4
n
__
is an element of S
n
.
Apparently lim
n
y
n
= x in C
1
[0, 2]
2
and hence lim
n
S
n
S. Let
f
0
(x) :=
_
2
0
(x
1
x
2
+x
2
x
1
+ x
2
1
+ x
2
2
)dt.
Apparently f and f
0
only differ by a constant on S. The functions f
n
and f
0
are
by Theorem 9.8.1 continuous. The corresponding Lagrangians are componentwise
convex and pointwise convergent. The stability Theorem 9.8.10 yields that x(t) is
solution of the original problem, since the sequence (x
n
) converges in C
1
[0, 2]
2
to x.
9.8.6 Stability in OrliczSobolev Spaces
Denition 9.8.11. Let T R
r
be an open set and the Lebesgue measure and let
(T) < , let be a Young function. Then the OrliczSobolev space W
m
L
()
consists of all functions u in L
(), whose weak derivatives (in the distributional

sense) D
u also belong to L
(). The corresponding norm is given by

|u|
m,
:= max
0[[m
|D
u|
()
.
W.r.t. this norm W
m
L
() is a Banach space (see [2]).

Theorem 9.8.12. Let T R
r
be an open set and the Lebesgue measure and let
(T) < , let be a Young function, satisfying the
2
-condition. Let J be set of
all mappings
L : T R R
n
R
0
satisfying the following properties:
(a) for all (w, z) R R
n
the mapping t L(t, w, z) : T R
0
is measurable
(b) for all t T the mapping (w, z) L(t, w, z) : R R
n
R
0
is convex
(c) there is a M R
0
such that for all (t, w, z) T R R
n
L(t, w, z) M
_
1 +(w) +
n
i=1
(z
i
)
_
holds.
Then the set
F :=
_
f
LJ : f : W
1
L
() R
0
, u f(u) =
_
T
L(t, u(t), u(t))d(t)
_
is equicontinuous.
Proof. Due to (b) F is a family of convex functions. For each f F and each
u W
1
L
()
0 f(u)
_
T
M(1+(u) +
n
i=1
(D
i
u)d = M
_
(T) +f
(u) +
n
i=1
f
(D
i
u)
_
holds. If u is in the W
1
L
()-ball with radius R, then |u|

()
and |D
i
u|
()
R
for i = 1, . . . , n. The modular f
is because of Theorem 6.3.10 bounded on bounded

subsets of L
(), i.e. there is a K R such that

f(u) M((T) + (n + 1)K).
Therefore F is uniformly bounded on every ball of W
1
L
() and according to The-

orem 5.3.8 equicontinuous.
Existence results for variational problems in OrliczSobolev spaces can be found
in [8].
9.9 Parameter-free Approximation of Time Series Data by
Monotone Functions
In this section we treat a problem that occurs in the analysis of Time Series: we deter-
mine a parameter-free approximation of given data by a smooth monotone function in
order to detect a monotone trend function from the data or to eliminate it in order to
facilitate the analysis of cyclic behavior of difference data (Fourier analysis). In the
discrete case, this type of approximation is known as monotone regression (see [13],
p. 28 f.).
In addition to the data themselves, our approximation also takes derivatives into
account, employing the mechanism of variational calculus.
9.9.1 Projection onto the Positive Cone in Sobolev Space
Minimize
_
b
a
(v x)
2
+ ( v x)
2
dt on S,
where S := v RCS
1
[a, b] [ v 0.
Linear Supplement. Let F(t, p) = (t) p, then
d
dt
F(, v()) = F
t
+F
p
v = v + v.
Using the transversality conditions for free endpoints: (a) = (b) = 0, we obtain
that the supplement is constant on S
_
b
a
v + vdt = [(t)v(t)]
b
a
= 0.
Let

L(p, q) := L(p, q) p q =
1
2
(x p)
2
p +
1
2
( x q)
2
q. Pointwise
minimization of

L w.r.t. p and q is broken down into two separate parts:
1. min
1
2
(x p)
2
p [ p R with the result: = p x
2. min
1
2
( x q)
2
q [ q R
0
.
In order to perform the minimization in 2. we consider the function
(q) :=
1
2
( x q)
2
q.
Section 9.9 Parameter-free Approximation of Time Series Data 355
Setting the derivative equal to zero, we obtain for c := x +
q
= q x = q c = 0.
For c 0 we obtain a solution of the parabola .
For c < 0 we have q = 0 because of the monotonicity of the parabola to the right
of the global minimum.
For c 0 we obtain the linear inhomogeneous system of linear differential equa-
tions
_

v
_
=
_
0 1
1 0
__
v
_
+
_
x
x
_
.
For c(t) = x(t) + (t) < 0, this inequality holds, because of the continuity of c, on
an interval I, i.e. v(t) = 0 on I, hence. v(t) = there. We obtain (t) = x(t) on
I, i.e.
(t) (t
1
) =
_
t
t
1
( x())d = (t t
1
)
_
t
t
1
x()d.
Algorithm (Shooting Procedure)
Choose = v(a), notation: c(t) = x(t) +(t), note: (a) = 0.
Start: c(a) = x(a).
If x(a) 0 then set t
1
= a, goto 1.
If x(a) < 0 then set t
0
= a, goto 2.
1. Solve initial value problem
_

v
_
=
_
0 1
1 0
__
v
_
+
_
x
x
_
with the initial values (t
1
), v(t
1
), for which we have the following explicit
solutions (see below):
(t) = (v(t
1
) x(t
1
)) sinh(t t
1
) +(t
1
) cosh(t t
1
)
v(t) = x(t) + (v(t
1
) x(t
1
)) cosh(t t
1
) +(t
1
) sinh(t t
1
).
Let t
0
be the rst root with change of sign of c(t) such that t
0
< b, goto 2.
2. (t) = v(t
0
)(t t
0
)
_
t
t
0
x()d + (t
0
). Let t
1
be the rst root with change
of sign of c(t) such that t
1
< b, goto 1.
For given = v(a), this algorithm yields a pair of functions (
, v
). Our aim is, to

determine a such that
(b) = 0. In other words: let () :=
(b), then we
have to solve the (1-D)-equation () = 0.
Solution of Differential Equation in 2. Let U(t) a fundamental system then the
solution of the (inhomogeneous) system is given by
V (t) = U(t)
_
t
t
1
U
1
()b()d +U(t)D,
where D = U
1
(t
1
)V (t
1
). In our case we have the fundamental system
U(t) =
_
cosh t sinh t
sinh t cosh t
_
,
and hence
U
1
(t) =
_
cosh t sinh t
sinh t cosh t
_
,
which yields (using corresponding addition theorems)
U(t)U
1
() =
_
cosh(t ) sinh(t )
sinh(t ) cosh(t )
_
.
We obtain
V (t) =
_
t
t
1
_
cosh(t ) sinh(t )
sinh(t ) cosh(t )
__
x()
x()
_
d
+
_
cosh(t t
1
) sinh(t t
1
)
sinh(t t
1
) cosh(t t
1
)
__
(t
1
)
v(t
1
)
_
i.e.
(t) =
_
t
t
1
x() sinh(t ) x() cosh(t )d
+(t
1
) cosh(t t
1
) +v(t
1
) sinh(t t
1
)
v(t) =
_
t
t
1
x() cosh(t ) x() sinh(t )d
+(t
1
) sinh(t t
1
) +v(t
1
) cosh(t t
1
).
The integrals can be readily solved using the product rule
(t) = [x() sinh(t )]
t
t
1
+(t
1
) cosh(t t
1
) +v(t
1
) sinh(t t
1
)
v(t) = [x() cosh(t )]
t
t
1
+(t
1
) sinh(t t
1
) +v(t
1
) cosh(t t
1
).
We nally obtain the following explicit solutions
(t) = (v(t
1
) x(t
1
)) sinh(t t
1
) +(t
1
) cosh(t t
1
)
v(t) = x(t) + (v(t
1
) x(t
1
)) cosh(t t
1
) +(t
1
) sinh(t t
1
).
Existence. If we minimize the strongly convex functional f(v) :=
_
b
a
(v x)
2
+
( v x)
2
dt on the closed and convex subset S of the Sobolev space W
1
2,1
[a, b], where
S := v W
1
2,1
[a, b] [ v 0, then, according to Theorem 3.13.5 f has a unique
minimal solution in S. But the proof of Theorem 9.9.3 carries over to the above
problem, yielding that both v and are in C
1
[a, b].
9.9.2 Regularization of Tikhonov-type
In practice it turns out that tting the derivative of v to the derivative x of the data has
undesired effects. Instead we consider the following problem:
Minimize f
(v) :=
1
2
_
b
a
(v x)
2
+ (1 +) v
2
dt
on the closed and convex subset S of the Sobolev space W
1
2,1
[a, b], where
S := v W
1
2,1
[a, b] [ v 0.
Let

L(p, q) := L(p, q) pq =
1
2
(xp)
2
p+
1
2
(1+)q
2
q. Pointwise
minimization of

L w.r.t. p and q is broken down into two separate parts:
1. min
1
2
(x p)
2
p [ p R with the result: = (p x)
2. min
1+
2
q
2
q [ q R
0
.
In order to perform the minimization in 2. we consider the function
(q) :=
1 +
2
q
2
q.
Setting the derivative equal to zero, we obtain for c :=
q
= (1 +)q = (1 +)q c = 0.
For c 0 we obtain a minimal solution of the parabola .
For c < 0 we have q = 0 because of the monotonicity of the parabola to the right
of the global minimum.
For c 0 we obtain the inhomogeneous system of linear differential equations
_

v
_
=
_
0
1
1+
0
__
v
_
+
_
x
0
_
.
For c(t) = (t) < 0, this inequality holds, because of the continuity of c, on an
interval I, i.e. v(t) = 0 on I, and we obtain v(t) = there. Hence (t) = (x(t))
on I, i.e.
(t) (t
1
) =
_
t
t
1
( x())d = (t t
1
)
_
t
t
1
x()d.
Algorithm (Shooting Procedure)
Start: Choose = v(a) ,= x(a), notation: c(t) = (t), note: c(a) = (a) = 0.
Put t
1
= a, if > x(a) goto 1, else goto 2.
1. Solve initial value problem
_

v
_
=
_
0
1
1+
0
__
v
_
+
_
x
0
_
with the initial values (t
1
), v(t
1
), for which we have the following solutions:
(t) =
_
t
t
1
x() cosh (t )d
+(t
1
) cosh (t t
1
) +v(t
1
) sinh (t t
1
)
v(t) =
_
t
t
1
x() sinh (t )d
+
1
(t
1
) sinh (t t
1
) +v(t
1
) cosh (t t
1
)),
where :=
_

1+
(see below).
Let t
0
be the rst root with change of sign of c(t) such that t
0
< b, goto 2.
2. (t) = v(t
0
)(tt
0
)
_
t
t
0
x()d +(t
0
). Let t
1
be the rst root with change
of sign of c(t) such that t
1
< b, goto 1.
For given = v(a), this algorithm yields a pair of functions (
, v
). Our aim is, to

determine a such that
(b) = 0. In other words: let () :=
(b), then we
have to solve the 1-D equation () = 0 (see discussion below).
Solution of the Differential Equation in 2. Let U(t) be a fundamental system then
the solution of the (inhomogeneous) system is given by
V (t) = U(t)
_
t
t
1
U
1
()b()d +U(t)D,
where D = U
1
(t
1
)V (t
1
). For the eigenvalues of
_
0
1
1+
0
_
we obtain
1,2
:=
_

1+
resulting in the following fundamental system:
U(t) =
_
cosh t sinh t
sinh t cosh t
_
and hence
U
1
(t) =
1
_
cosh t sinh t
sinh t cosh t
_
which yields (using corresponding addition theorems)
U(t)U
1
() =
_
cosh (t ) sinh (t )
1
sinh (t ) cosh (t )
_
.
We obtain
V (t) =
_
t
t
1
_
cosh (t ) sinh (t )
1
sinh (t ) cosh (t )
__
x()
0
_
d
+
_
cosh (t t
1
) sinh (t t
1
)
1
sinh (t t
1
) cosh (t t
1
)
__
(t
1
)
v(t
1
)
_
i.e.
(t) =
_
t
t
1
x() cosh (t )d
+(t
1
) cosh (t t
1
) +v(t
1
) sinh (t t
1
)
v(t) =
_
t
t
1
x() sinh (t )d
+
1
(t
1
) sinh (t t
1
) +v(t
1
) cosh (t t
1
)).
Discussion of Existence, Convergence, and Smoothness. We minimize the convex
functional
f
(v) :=
1
2
_
b
a
(v x)
2
+ (1 +) v
2
dt
1
2,1
[a, b], where S := v
W
1
2,1
[a, b] [ v 0. For f
we have the following representation:

f
(v) :=
1
2
_
b
a
((v
2
+ v
2
2vx +x
2
) + v
2
)dt.
The convex functional
f(v) :=
1
2
_
b
a
(v
2
+ v
2
2vx +x
2
)dt
is apparently uniformly convex on W
1
2,1
[a, b], since
f
_
u +v
2
_
=
1
2
(f(u) +f(v))
1
4
|u v|
2
W
1
2,1
[a,b]
.
Let further g(v) :=
1
2
_
b
a
v
2
dt, then apparently f
= f + g. By the Regularization
Theorem of Tikhonov-type 8.7.2 we obtain the following statement: Let now
n
be a
positive sequence tending to zero and f
n
:=
n
f +g. Let nally v
n
be the (uniquely
determined) minimal solution of f
n
on S, then the sequence (v
n
) converges to the
(uniquely determined) minimal solutions of f on M(g, S), where M(g, S) apparently
consists of all constant functions (more precisely: of all those functions which are
constant a.e.). On this set f assumes the form
f(v) =
1
2
_
b
a
(v
2
2vx +x
2
)dt =
1
2
_
b
a
(v x)
2
dt.
The minimal solution of f on M(g, S) is apparently the mean value
=
1
b a
_
b
a
x(t)dt.
We now turn our attention to solving the (1-D)-equation () = 0.
We observe for t > a
1
sinh (t a) =
_
t
a
cosh (t )d,
hence for = v(a) and t
1
= a:
(a) If > (1 +) max x(t) then (t) > 0 for all t > a, since
(t) =
_
t
a
_
x()
_
cosh (t )d,
and

2
=
1
1+
. We conclude () = (b) > 0.
(b) If < min x(t) then (t) < 0 since
(t) =
_
t
a
( x())d
for all t > a, hence () = (b) < 0.
The decision whether 1. or 2. is taken in the above algorithm can be based on the
equation = (v x), in particular: (a) = ( x(a)).
In order to establish the continuity of the function we consider the following
initial value problem for the non-linear ODE system
= (v x)
v =
1
1 +
+[[
2
for (a) = 0 and v(a) = , which for non-negative corresponds to the above
linear system, whereas for negative the second equation reads v = 0. The right-
hand side apparently satises a global Lipschitz condition. Due to the theorem on
continuous dependence on initial values (see [59]) the solution exists on the whole
interval [a, b], both v and belong to C
1
[a, b] and depend continuously on the initial
conditions, in particular (b) depends continuously on . Since (b) changes sign
between > (1 +) max x(t) and < min x(t) there is a
0
such that for v(a) =
0
we have (b) = 0, where
0
can be determined e.g. via the bisection method.
Remark 9.9.1. The above smoothness result is essentially a special case of Theo-
rem 9.9.3.
The question remains, how a positive trend in the data x can be detected. We dene:
Denition 9.9.2. x does not contain a positive trend, if the monotone regression, i.e.
the minimal solution of h(v) :=
1
2
_
b
a
(vx)
2
dt on S is the constant function identical
to the mean value .
Apparently, in this case the minimal solution of f
(v) =
1
2
_
b
a
(v x)
2
+ (1 +
) v
2
dt on S is also equal to for all > 0.
In other words: the existence of a positive trend can be detected for each positive .
The corresponding solutions can also be obtained via appropriate numerical methods
applied to the above non-linear ODE system.
An Alternative Approach
We minimize the convex functional
g
(v) :=
1
2
_
b
a
(1 +)(v x)
2
+ v
2
dt
1
2,1
[a, b], where S := v
W
1
2,1
[a, b] [ v 0 under the assumption that h(v) :=
1
2
_
b
a
(v x)
2
dt has a minimal
solution on S, the monotone regression. For g
we have the following representation:

g
(v) :=
1
2
_
b
a
((v
2
+ v
2
2vx +x
2
) + (v x)
2
)dt.
As was pointed out above, the convex functional
f(v) :=
1
2
_
b
a
(v
2
+ v
2
2vx +x
2
)dt
is uniformly convex on W
1
2,1
[a, b]. Let (
n
) be a positive sequence tending to zero,
then due to Theorem 8.7.2 the sequence (x
n
) of minimal solutions of g
n
:= g
n
converges to the monotone regression.
9.9.3 A Robust Variant
We can modify the functional f
in the spirit of robust statistics by introducing a

differentiable and strictly convex Young function
f
(v) :=
_
b
a
(v x) +
1
2
v
2
dt
using the same linear supplement as above we obtain the following initial value prob-
lem:
=
t
(v x)
v =
+[[
2
for (a) = 0 and v(a) = . If is twice continuously differentiable with bounded
second derivative, then the right-hand side satises a global Lipschitz condition. In
particular this results in a continuous dependence on initial conditions.
Let > max x(t). Since v 0 we have v hence (t) =
_
t
a

t
(v()
x())d > 0 for all a < t b, in particular (b) > 0.
If < min x(t) =: , then (a) =
t
( x(a)) < 0. Hence there is a non-empty
interval (0, r) such that (t) < 0 on (0, r). Let r
0
be the largest such r and suppose
r
0
< b. Then (t) < 0 on (0, r
0
), thus v = 0 on (0, r
0
), i.e. v(t) = on [0, r
0
). But
then (t) =
t
( x(t))
t
( ) < 0 for all t [0, r
0
), hence (r
0
) < 0
and there is an > 0 such that (r
0
+) < 0, a contradiction.
We conclude: (b) < 0 for < min x(t). Due to the continuous dependence on
initial values the intermediate value theorem again guarantees the existence of a
such that (b) = 0.

Now we consider the convergence question for 0:
Since
[min x(t), max x(t)] there is a convergent subsequence

n
:=
0
. Let y
n
:= (v
n
,
n
)
T
be the sequence of solutions obtained for each
n
, then
according to the theorem (Folgerung B.2 in [59])
|y
n
(t) y
0
(t)| [
n

0
[e
L(ta)
+

n
L
sup
[a,b]
[
t
(v
0
(t) x(t))[(e
L(ta)
1)
where y
0
= (v
0
,
0
)
T
is the solution of
= 0
v =
+[[
2
for the initial conditions (a) = 0 and v(a) =
0
. Apparently
0
(t) = 0 for all
t [a, b], hence v
0
= 0 on [a, b] and thus v
0
(t) =
0
. Hence (v
n
,
n
)
T
converges to
(
0
, 0)
T
in (C[a, b])
2
.
Section 9.10 Optimal Control Problems 363
But v
n
v
0
also in C
1
[a, b]: we have
n
(t) =
n
_
t
a

t
(v
n
() x())d and
hence
[ v
n
(t) v
0
(t)[
n
_
t
a
[
t
(v
n
()x())[d
n
_
b
a
[
t
(v
n
()x())[d
n
C,
since [
t
(v
n
x)[ is bounded.
Now, our theorem on second stage solutions 5.6.2 yields that v
0
is a minimal solu-
tion of
f(v) :=
_
b
a
(v() x())d on M(g, S),
with g(v) :=
1
2
_
b
a
( v)
2
dt. Apparently M(g, S) consists of all constant functions.
Since is strictly convex, the strictly convex function
_
b
a
( x())d has a
uniquely determined solution
0
. Hence
0
=
0
, and this holds for every convergent
subsequence, thus

0

0
and v
v
0
=
0
in C
1
[a, b].
We have thus proved the following existence and convergence theorem:
Theorem 9.9.3. Let be a strictly convex and twice continuously differentiable
Young function with bounded second derivative. Then the problem
Minimize f
(v) :=
_
b
a
(v x) +
1
2
v
2
dt on S := v C
1
[a, b] [ v 0
has a unique solution v
S for each > 0.

For 0 we obtain v
v
0
=
0
in C
1
[a, b], where
0
is the uniquely deter-
mined minimal solution of the function
_
b
a
( x())d.
Remark 9.9.4. The proof above also shows that the corresponding
of the linear
supplement belongs to C
1
[a, b].
9.10 Optimal Control Problems
In contrast to variational problems we cannot assume that the controls u are in C(T).
For convex-concave Lagrangians we obtain the following assertion about equiconti-
nuity of the corresponding functionals:
Theorem 9.10.1. Let (T, , ) be a nite measure space, where T is a compact set.
Let J be a family of mappings L : R
n
R
m
R, satisfying the following properties:
(a) L(p, ) : R
m
R is convex for all p R
n
(b) L(, r) : R
n
R is concave for all r R
m
(c) L(p, r) M(1 +
(r
i
)) for all p R
n
, r R
m
where is a Young function, satisfying the
2
-condition. Let V be an open convex
subset of (C(T))
n
and U an open convex subset of (L
())
m
then the family F of all
functions f : V U R with
f(x, u) :=
_
L(x, u)d with L J
is equicontinuous, provided that F is pointwise bounded from below.
Proof. Let U
0
U be open and bounded. Let x V and u U
0
, then (see Theo-
rem 6.3.10) there is a constant K, such that
f(x, u) =
_
T
L(x, u)d M
_
T
_
1 +
m
i=1
(u
i
)
_
d
M(T) +M
m
i=1
_
T
(u
i
)d
= M(T) +M
i=1
f
(u
i
) M(T) +m K.
In particular f(x, ) is continuous for all x V . Moreover, the family F is point-
wise bounded. On the other hand L(, r) : R
n
R is convex and hence contin-
uous according to Theorem 5.3.11 for all r R
n
. Let now x
n
x in (C
1
(T))
n
,
then as in Theorem 9.8.1 L(x
n
, u) L(x, u) uniformly, hence
_
T
L(x
n
, u)d
_
T
L(x, u)d, i.e. f(, u) continuous on V for all u U. With Theorem 5.4.9 the
assertion follows.
For convex-concave Lagrangians pointwise convergence already implies pointwise
convergence of the corresponding functionals using the theorem of BanachSteinhaus:
Theorem 9.10.2. Let (T, , ) be a nite measure space, where T is a compact set
and let be a Young function, satisfying the
2
-condition.
Let the sequence (L
k
)
kN
of mappings L
k
: R
n
R
m
R have the following
properties:
(a) L
k
(p, ) : R
m
R is convex for all p R
n
(b) L
k
(, r) : R
n
R is concave for all r R
m
(c) L
k
(p, r) M(1 +
(r
i
)) for all p R
n
, r R
m
(d) L
k
(p, r)
k
L
0
(p, r) for all (p, r) R
n
R
m
.
Let V be an open subset of (C(T))
n
and U an open subset of (L
())
m
then the
sequence (f
k
: V U R)
kN
with f
k
(x, u) :=
_
T
L
k
(x, u)d converges pointwise
to f
0
(x, u) :=
_
T
L
0
(x, u)d.
Proof. Let u (L
())
m
be a step function, i.e. u :=
s
i=1
a
i
T
k
with
s
i=1
T
k
= T
and T
i
T
j
= for i ,= j, then
f
k
(x, u) =
s
k=1
_
T
i
L
k
(x(t), a
i
)d(t).
Let g
k,i
:=
_
T
i
L
k
(x(t), a
i
)d(t), then
g
k,i
k
g
i
=
_
T
i
L
0
(x(t), a
i
)d(t)
holds, since L
k
(, a
i
) is convex and hence according to Theorem 5.3.6 L
k
(, a
i
)
k
L
0
(, a
i
) continuously convergent and therefore (L
k
(x(), a
i
)) on T
i
also
uniformly convergent.
In all we obtain f
k
(x, u) f
0
(x, u). Using the theorem of BanachSteinhaus
5.3.17 the assertion follows.
9.10.1 Minimal Time Problem as a Linear L
1
-approximation Problem
A minimal time problem is obtained if the right end point of the time interval can be
freely chosen, and we look for the shortest time
0
, such that via an admissible control
the given point c R
n
with c ,= 0 is reached from the starting point.
If the function f(x) = |x(b) c|
2
is viewed in dependence of b, we look for the
smallest b, such that a minimal solution with minimal value 0 exists. In particular
we encounter optimal controls which are of bang-bang type. This is specied more
precisely in Theorem 9.10.5.
An important class of such problems can be treated by methods of linear L
1
-ap-
proximation. Let RS[0, ] denote the set of all piecewise continuous functions, such
that the right-sided limit exists. Let K
:= RCS
(1)
[0, ]
n
, X
:= RS[0, ]
m
and
Q
:= u X
[ [u
i
(t)[ 1, i 1, . . . , m, t [0, ]. (9.37)
Let A RS[0, )
nn
, B RS[0, )
nm
. Let further
R
:= u X
[x K
t [0, ] : x(t) = A(t)x(t) +B(t)u(t),

x(0) = 0, x() = c. (9.38)
The minimal time problem (MT) is now posed in the following form:
Minimize under the condition Q
,= . (9.39)
If the vector space X
is equipped with the norm

|u|
= sup[u
i
(t)[ [ i 1, . . . , m, t [0, ] (9.40)
then Q
is the unit ball in the normed space (X
, | |
). In particular for those with

Q
,= we have
w() := inf|u|
[ u R
1, (9.41)
and for with Q
= we obtain for all u R
|u|
> 1 and hence w() 1.

We will now attempt to prove that for the minimal time
0
w(
0
) = 1 (9.42)
holds.
A sufcient criterion will be supplied by Theorem 9.10.6.
Let H be a fundamental matrix of the ODE in (9.38), where H(0) is the n-th unit
matrix. Then the Condition (9.38) using Y (t) := H()H
1
(t)B(t), i 1, . . . , n
can be expressed in the form of n linear equations
_

0
Y
i
(t)u(t)dt = c
i
for i 1, . . . , n, (9.43)
where Y
i
is the i-th row of Y .
Using the above considerations we arrive at the following problem:
Find a pair (, u
) with the following properties:

u
is an element of minimal norm in

S
:= u X
[ u satises Equation (9.43), (9.44)

and
|u
= 1. (9.45)
In order to draw the connection to the above mentioned approximation in the mean,
we will reformulate for each the optimization problem given by Equations (9.43)
and (9.44). Since c ,= 0, there is a c
i
0
,= 0. In the sequel let w.l.o.g. i
0
= n. Then
Equation (9.43) can be rewritten using the functions
Z := Y
n
/c
n
and Z
i
:= Y
i

c
i
c
n
Y
n
for i 1, . . . , n 1 (9.46)
as
_

0
Z
i
(t)u(t)dt = 0, i 1, . . . , n 1,
_

0
Z(t)u(t)dt = 1. (9.47)
In addition to the problem
Minimize |u|
under the Conditions (9.47) (D1)

we consider
Maximize
(u) :=
_

0
Z(t)u(t)dt (D2)
under the conditions
_

0
Z
i
(t)u(t)dt = 0 for i 1, . . . , n 1 and |u|
= 1. (9.48)
Remark 9.10.3. One can easily see that problems (D1) and (D2) are equivalent in the
following sense:
If W is the maximal value of (D2), then 1/W is the minimal value of (D1) and a
u
0
is a solution of (D2), if and only if u
0
/W is a solution of (D1).
We will now establish that problem (D2) can be viewed as a dual problem of the
following non-restricted problem in R
n1
:
Minimize
(
1
, . . . ,
n1
) :=
_

0
_
_
_
_
Z
T
(t)
n1
i=0
i
Z
T
i
(t)
_
_
_
_
1
dt (P1)
on R
n1
, where | |
1
denotes the norm
m
i=1
[
i
[ in R
m
.
Apparently we have for all , R
m
i=1
_
max
1im
[
i
[
_
m
i=1
[
i
[. (9.49)
Remark 9.10.4. (P1) is a problem of approximation in the mean, treated in Chapter 1
for m = 1.
The following theorem holds:
Theorem 9.10.5. The problems (P1) and (D2) are weakly dual and (P1) is solvable.
If
is a solution of (P1) and if each component of h = (h

1
, . . . , h
m
)
T
:= Z
T
n1
i=1

i
Z
T
i
has only nitely many zeros in [0, ], then a solution u
of (D2) can be
constructed in the following way:
Let i 1, . . . , m xed, and let t
1
< . . . < t
k
the zeros with change of sign of
the i-th component h
i
of h. Let t
0
:= 0, t
k+1
:= and
i
:=
_
1, if h
i
on [0, t
1
) non-negative
1, otherwise.
Then the following componentwise dened function
u
i
(t) =
i
(1)
j1
, for t [t
j1
, t
j
), j 1, . . . , k + 1 (9.50)
is a maximal solution of (D2), and
(u
) =
) holds, i.e. the problem (D2) is

dual to (P1).
Proof. Let = (
1
, . . . ,
n1
) R
n1
and let u satisfy Equation (9.48). With
Equation (9.49) we obtain
(u) =
_

0
Z(t)u(t)dt =
_

0
_
Z(t)
n1
i=1
i
Z
i
(t)
_
u(t)dt
_

0
_
Z(t)
n1
i=1
i
Z
i
(t)
_
u(t)
dt
_

0
_
_
_
_
Z
T
(t)
n1
i=1
i
Z
T
i
(t)
_
_
_
_
1
|u|
dt
_

0
_
_
_
_
Z
T
(t)
n1
i=1
i
Z
T
i
(t)
_
_
_
_
1
dt =
(), (9.51)
and hence weak duality has been established. According to the theorem of Weierstrass
(P1) has a minimal solution
. Due to the Duality Theorem of Linear Approximation

(see Theorem 3.12.4) problem (D2), extended to the space (L
[0, ]
m
, | |
), has a
solution u with | u|
= 1 and
_

0
Z
i
(t) u(t) = 0 for all i 1, . . . , n 1.
Apparently (L
[0, ]
m
, | |
) is the dual space of (L

1
[0, ]
m
,
_
0
|x(t)|
1
dt) (see
Theorem7.6.3). For u the inequality corresponding to (9.51) is satised as an equality.
In particular
( u) =
_

0
h
T
(t) u(t)dt =
_

0
_
m
i=1
[h
i
(t)[
_
dt
=
_

0
_
_
_
_
Z
T
(t)
n1
i=1
i
Z
T
i
(t)
_
_
_
_
1
dt =
). (9.52)
From the above equality it follows that u = u
a. e. , for
_

0
h(t)
T
u(t)dt =
_

0
_
m
i=1
[h
i
(t)[
_
dt =
_

0
h(t)
T
u
(t)dt.
For assume there is T
0
[0, ] with (T
0
) > 0 with u(t) ,= u
(t) for all t T

0
.
Then there is
h
i
0
(t)u
i
0
(t) < [h
i
0
(t)[ = h
i
0
(t)u
i
0
(t)
for all t T
0
N
i
0
, where N
i
0
denotes the zeros of h
i
0
(t). Hence
_
T
0
h
i
0
(t)u
i
0
(t)dt <
_
T
0
[h
i
0
(t)[dt,
leading to a contradiction, since [h(t)
T
u(t)[
m
i=1
[h
i
(t)[ for all t [0, ].
Therefore u
satises the Condition (9.48). Since u
RS[0, ], u
is a solution of
(D2), for which
(u
) =
) holds, i.e. (D2) is dual to (P1).

We obtain the following sufcient criterion for minimal time solutions:
Theorem 9.10.6. Let
0
be such that a minimal solution
of (P1) has the value
0
(
) = 1 and each component of h := Z

T
n1
i=1

i
Z
T
i
has only nitely many
zeros. Then
0
solves the minimal time problem and the function u
dened by (9.50)
is a time-optimal control. By
x
(t) :=
_
t
0
Y (s)u
(s)ds for all t [0,

0
] (9.53)
a time-optimal state-function is dened.
Proof. Let <
0
. Then
r :=
) =
_

0
|h(s)|
1
ds <
_

0
0
|h(s)|
1
ds = 1
holds. Let
be a minimal solution of
, then r
:=
) r. Let u
0
= u
[
[0,]
be a solution of (D2), then according to Theorem 9.10.5 we obtain
(u
0
) =
) = r
< 1.
By Remark 9.10.3 we recall u
0
is a solution of (D2) with value r
if and only if
1
r
u
0
is a solution of (D1) with value
1
r
. More explicitly: for all u R
1 < 1/r
= |1/r
u
0
|
|u|
,
but then 1/r
u
0
/ Q
, and hence <

0
cannot be a minimal solution of (MT).
Example 9.10.7. We consider the control of a vehicle on rails. We want to arrive at
the point c = (1, 0)
T
in the shortest possible time. The ODE system in this situation
is given by
x
1
= x
2
and x
2
= u, (9.54)
i.e. A =
_
0 1
0 0
_
and B =
_
0
1
_
. A fundamental matrix H with H(0) = E is immedi-
ately obtained by H(t) =
_
1 t
0 1
_
. For the inverse H
1
(t) =
_
1 t
0 1
_
holds. This leads
to
Y (t) =
_
1
0
0 1
__
1 t
0 1
__
0
1
_
=
_
1
0
t
0 1
__
0
1
_
=
_
0
t
1
_
,
i.e.
Y
1
(t) =
0
t and Y
2
(t) = 1. (9.55)
Using (9.45) and (9.46) we have Z = Y
1
and Z
1
= Y
2
. For problem (P1) we obtain
in this case minimize
0
on R, where
0
() :=
_

0
0
[
0
t [dt.
According to the characterization theorem of convex optimization it is necessary and
sufcient that for all h R
0
t
0
(, h) = h
_

0
0
sign(
0
t )dt
holds. For =
0
/2 we have
_
0
0
sign(
0
/2t)dt = 0. Hence =
0
/2 is a minimal
solution of (P1) with value W =
2
0
/4. From requirement W = 1 the minimal time
0
= 2 follows.
The function
0
t
0
/2 =
0
/2 t changes sign at t = 1. By (9.50) the control
u
(t) =
_
1 for t [0, 1)
1 for t [1, 2]
(9.56)
is time-optimal. Using (9.53), (9.55), and (9.56) it follows that
x
(t) :=
_
t
0
_
2 s
1
_
u
(s)ds.
Due to Theorem 9.10.6 the pair (x
, u
) is the solution to our problem.

Bibliography
[1] Achieser, N. I.: Vorlesungen ber Approximationstheorie, Akademie Verlag, Berlin,
1953.
[2] Adams, R. A.: Sobolev Spaces, Academic Press, New York, 1975.
[3] Akimovich, B. A.: On the uniform convexity and uniform smoothness of Orlicz spaces.
Teoria Functii Functional Anal & Prilozen, 15 (1972), 114120.
[4] Asplund, E., Rockafellar, R. T.: Gradients of Convex Functions, Trans. Amer. Math.
Soc. 139 (1969), 443467.
[5] Attouch, H.: Variational Convergence for Functions and Operators. Applicable Math-
ematics Series, Pitman Advanced Publishing Program, Boston, London, Melbourne,
1984.
[6] Bellmann, R. E.: Dynamic Programming, Princeton University Press, Princeton, N. J.,
1957.
[7] Bennet, C., Sharpley, R.: Interpolation of Operators, Academic Press, Vol. 129 in Pure
and Appl. Math., 1988.
[8] Bildhauer, M.: Convex Variational Problems, Lecture Notes in Mathematics 1818,
Springer, Heidelberg, N. Y., 2003.
[9] Birnbaum, Z. W., Orlicz, W.: ber die Verallgemeinerung des Begriffes der zueinander
konjugierten Potenzen, Studia Math. 3 (1931), 164.
[10] Bliss, G. A.: The Problem of Lagrange in the Calculus of Variations, American Journal
of Mathematics, Vol. LII (1930), 673744.
[11] Blum, E., Oettli, W.: Mathematische Optimierung, Springer, Berlin, Heidelberg, New
York, 1975.
[12] Bolza, O.: Vorlesungen ber Variationsrechnung, Teubner Leipzig u. Berlin, 1909.
[13] Borg, I., Lingoes, J.: Multidimensional Similarity Structure Analysis. Springer, Berlin,
Heidelberg, New York, 1987.
[14] Boyd, D. W.: The Hilbert Transform on rearrangement-invariant Spaces, Canad. J. of
Math. 19 (1967), 599616.
[15] Browder, F.: Problmes Nonlinaires, Univ. of Montreal Press, 1966.
[16] Browder, F. E.: Nonlinear operators and non-linear equations of evolution in Ba-
nach Spaces, Proc. Symp. Nonlinear Funct. Analysis, Chicago Amer. Math. Soc. 81,
pp. 890892, 1975.
[17] Carathodory, C.: Variationsrechnung und partielle Differentialgleichungen erster Ord-
nung. Teubner, Leipzig u. Berlin, 1935.
372 Bibliography
[18] Carathodory, C.: Variationsrechnung und partielle Differentialgleichungen erster Ord-
nung: Variationsrechnung. Herausgegeben, kommentiert und mit Erweiterungen zur
Steuerungs- u. Dualittstheorie versehen von R. Kltzler, B. G. Teubner, Stuttgart,
Leipzig, 1994.
[19] Cesari, L.: Optimization Theory and Applications, Springer, New York, Heidelberg,
Berlin, 1983.
[20] Chen, S.: Geometry of Orlicz Spaces, Dissertationes Math. 356, 1996.
[21] Chen, S., Huiying, S.: Reexive Orlicz Spaces have uniformly normal structure, Studia
Math. 109(2) (1994), 197208.
[22] Cheney, E. E.: Introduction to Approximation Theory, McGraw-Hill, 1966
[23] Choquet,G.: Lectures on Analysis I, II Amsterdam, New York 1969.
[24] Day, M. M.: Normed Linear Spaces (3rd Edn.), Ergebnisse der Mathematik u. ihrer
Grenzgebiete, Bd 21., Springer, 1973.
[25] Descloux, I.: Approximation in L
p
and Tschebyscheff approximation. SIAM J. Appl.
Math. 11 (1963), 10171026.
[26] Diaz, J. B., Metcalf, F. T.: On the structure of the set of subsequential limit points of
successive approximations, Bull. Amer. Math. Soc. 73 (1967), 516519.
[27] Dominguez, T., Hudzik, H.,Lopez, G., Mastylo, M., Sims, B.: Complete Characteriza-
tion of KadecKlee Properties in Orlicz Spaces, Houston Journal of Mathematics 29(4)
(2003), 10271044.
[28] Dunford, N. Schwartz, J. T.: Linear Operators, Part I, Wiley Interscience, 1958.
[29] Fan, K., Glicksberg, I.: Some geometric properties of the spheres in a normed linear
space, Duke Math. J. 25 (1958), 553568.
[30] Figiel, T.: On moduli of convexity and smoothness, Studia Math 56 (1976), 121155.
[31] Fleming, W. H., Rishel, R. W.: Deterministic and Stochastic Optimal Control, Springer,
Berlin, Heidelberg, New York, 1975.
[32] Garrigos, G., Hernandez, E., Martell, J.: Wavelets, Orlicz Spaces, and greedy Bases,
submitted Jan. 2007.
[33] Giaquinta, M., Hildebrand, S.: Calculus of Variations I and II, Springer, Berlin, Hei-
delberg, New York, 1996.
[34] Gustavson, S. A.: Nonlinear systems in semi-innite programming, in: Numerical So-
lution of Systems of Nonlinear Algebraic Equations, Eds. G. D. Byren and C. A. Hall,
Academic Press, N. Y., London, 1973.
[35] Harms, D.: Optimierung von Variationsfunktionalen, Dissertation, Kiel, 1983.
[36] Hartmann, Ph.: Ordinary Differential Equations, Wiley and Sons, New York, 1964.
[37] Hebden, M. D.: A bound on the difference between the Chebyshev Norm and the
Hlder Norms of a function, SIAM J. of Numerical Analysis 8 (1971), 271277.
Bibliography 373
[38] Hestenes, M. R.: Calculus of Variations and Optimal Control Theory, J. Wiley and
Sons, New York, 1966.
[39] Heuser, H.: Funktionalanalysis, B. G. Teubner, Stuttgart, 1975.
[40] Hewitt, E., Stromberg, K.: Real and Abstract Analysis. Springer, 1965.
[41] Hirzebruch, F., Scharlau, W.: Einfhrung in die Funktionalanalysis; BI Bd. 296,
Mannheim 1971.
[42] Holmes, R. B.: A Course on Optimization and Best Approximation; Lecture Notes in
Math. 257, Springer 1972.
[43] Hudzik, H., Maligandra, L.: Amemiya norm equals Orlicz norm in general, Indag.
Math. 11 (2000), 573585.
[44] Ioffe, A. D., Tichomirov, V. M.: Theorie der Extremalaufgaben, Deutscher Verlag der
Wissenschaften, Berlin, 1979.
[45] James, R. C.: Weakly compact sets, Trans. Amer. Math. Soc. 113 (1964), 129140.
[46] Kaminska, A.: On uniform rotundity of Orlicz spaces, Indag. Math. A85 (1982), 27
36.
[47] Karlovitz, L.: Construction of nearest points in the L
p
, p even, and L
norms, J. Ap-
prox. theory 3 (1970), 125127.
[48] Kltzler, R., Pickenhain, S.: Pontrjagins Maximum Principle for multidimensional
control problems, Optimal Control (Freiburg 1991), 2130, Internat. Ser. Numer. Math.
111, Birkhuser, Basel, 1993.
[49] Kthe, G.: Topologische lineare Rume I. Springer, Berlin, Gttingen, Heidelberg
1966.
[50] Kosmol, P., Flache Konvexitt und schwache Differenzierbarkeit, Manuscripta Mathe-
matica 8 (1973), 267270.
[51] Kosmol, P.: ber Approximation stetiger Funktionen in Orlicz-Rumen, Journ. of Ap-
prox. Theory 8 (1973), 6783.
[52] Kosmol,P.: Flache Konvexitt von Orliczrumen, Colloq. Math. XXXI, pp. 249252,
1974.
[53] Kosmol, P.: Nichtexpansive Abbildungen bei Optimierungsverfahren, First Sympo-
sium on Operations Research, University of Heidelberg, Sept. 13, 1976, in Opera-
tions Research Verfahren/ Methods of Operations Research XXV, 8892, R. Henn et
al. (Ed.), Verlag Anton Hain, Meisenheim, 1976.
[54] Kosmol, P.: Optimierung konvexer Funktionen mit Stabilittsbetrachtungen, Disserta-
tiones Mathematicae CXL, pp. 138, 1976.
[55] Kosmol, P.: Regularization of Optimization Problems and Operator Equations, Lec-
ture Notes in Economics and Mathematical Systems 117, Optimization and Operations
Research, Oberwolfach 1975, pp. 161170, Springer 1976.
[56] Kosmol, P., Wriedt, M.: Starke Lsbarkeit von Optimierungsaufgaben, Math.
Nachrichten 83 (1978), 191195.
374 Bibliography
[57] Kosmol, P.: Bemerkungen zur Brachistochrone, Abh. Math. Univ. Sem. Hamburg 54
(1984), 9194.
[58] Kosmol, P.: Ein Algorithmus fr Variationsungleichungen und lineare Opti-
mierungsaufgaben, Deutsch-franzsisches Treffen zur Optimierungstheorie, Hamburg,
1986.
[59] Kosmol, P.: Optimierung und Approximation. de Gruyter Lehrbuch, Berlin, New York,
1991. 2. berarbeitete und erweiterte Auage 2010.
[60] Kosmol, P.: Methoden zur numerischen Behandlung nichtlinearer Gleichungen und
Optimierungsaufgaben. B. G. Teubner Studienbcher, Stuttgart, zweite Auage, 1993.
[61] Kosmol, P: An Elementary Approach to Variational Problems, Proceedings of the 12th
Baikal International Conference on Optimization Methods and their Applications, Sec-
tion 2. Optimal Control, Irkutsk, Baikal, pp. 202208, June 24July 1, 2001.
[62] Kosmol, P.: Projection Methods for Linear Optimization, J. of Contemporary Mathe-
matical Analysis (National academy of Sciences of Armenia) XXXVI(6) (2001), 49
56.
[63] Kosmol, P., Pavon, M.: Solving optimal control problems by means of general La-
grange functionals, Automatica 37 (2001), 907913.
[64] Kosmol, P.: ber Anwendungen des matrixfreien Newtonverfahrens, Berichtsreihe des
Mathematischen Seminars der Universitt Kiel (Kosmol 2005B), 2005.
[65] Kosmol, P., Mller-Wichards, D.: Homotopy Methods for Optimization in Orlicz
Space, Proceedings of the 12th Baikal Int. Conf. on Optimization Methods and their
Applications, Vol. 1, pp. 224230, June 24July 1, 2001.
[66] Kosmol, P., Mller-Wichards, D.: Homotopic Methods for Semi-innite Optimization,
J. of Contemporary Mathematical Analysis, National academy of Sciences of Armenia,
XXXVI(5) (2001), 3551.
[67] Kosmol, P., Mller-Wichards, D.: Pointwise Minimization of supplemented Variational
Problems, Colloquium Mathematicum, 101(1) (2004), 1549.
[68] Kosmol, P., Mller-Wichards, D.: Stability for Families of Nonlinear Equations, Isves-
tia NAN Armenii. Matematika, 41(1) (2006), 4958.
[69] Kosmol, P., Mller-Wichards, D.: Optimierung in Orlicz-Rumen, Manuscript, Uni-
versity of Kiel, 2008.
[70] Kosmol, P., Mller-Wichards, D.: Strong Solvability in Orlicz-Spaces, J. of Contempo-
rary Mathematical Analysis, National academy of Sciences of Armenia, 44(5) (2009),
271304. Isvestia NAN Armenii. Matematika, No 5 (2009), 2867.
[71] Krabs, W.: Optimierung und Approximation, Teubner, Stuttgart, 1972.
[72] Krasnosielskii, M. A., Ruticki, Ya, B.: Convex Functions and Orlicz Spaces. P. Noord-
hoff, Groningen, 1961.
[73] Kripke, B.: Best Approximation with respect to nearby norms, Numer. Math. 6 (1964),
103105.
Bibliography 375
[74] Krotov, W. F., Gurman, W. J.: Methoden und Aufgaben der optimalen Steuerung
(russ.), Moskau, Nauka, 1973.
[75] Kurc, W.: Strictly and Uniformly Monotone MusielakOrlicz Spaces and Applications
to Best Approximation, J. Approx. Theory 69 (1992), 173187.
[76] Levitin, E. S., Polyak, B. T.: Convergence of minimizing sequences in conditional ex-
tremum problems, Doklady 5 (1966/1968), 764767.
[77] Lindenstrauss, J.: On the modulus of smoothness and divergent series in Banach
spaces. Mich. Math. J. 10 (1963), 241252.
[78] Lovaglia, A. R.: Locally uniform convex Banach spaces, Trans. Amer. Math. XVIII
(1955), 225238.
[79] Luenberger, D. G.: Optimization by vector space methods, Wiley and Sons, NY, 1969.
[80] Luxemburg, W. A. J.: Banach function Spaces, Doctoral Thesis, Delft, 1955.
[81] Luxemburg, W. A., Zaanen, A.: Conjugate Spaces of Orlicz Spaces, Indag. Math.
XVIII (1956), 217228.
[82] Luxemburg, W. A. J., Zaanen, A. C.: Riesz spaces I, Amsterdam, London 1971.
[83] Maleev, R. P., Troyanski, S. L.: On the moduli of convexity and smoothness in Orlicz
spaces. Studia Math, 54 (1975), 131141.
[84] Meyer, Y.: Wavelets and operators. Cambridge Studies in Advanced Mathematics 37,
1992.
[85] Milnes, H. W.: Convexity of Orlicz Spaces, Pacic J. Math. VII (1957), 14511483.
[86] Mller-Wichards, D.: ber die Konvergenz von Optimierungsmethoden in Orlicz-
Rumen. Diss. Uni. Kiel, 1976
[87] Mller-Wichards, D.: Regularisierung von verallgemeinerten Polya Algorithmen. Op-
erations Research Verfahren, Verlag Anton Hain, Meisenheim, XXIV, pp. 119136,
1978.
[88] Musielak, J.: Orlicz Spaces and Modular Spaces, Lecture Notes in Mathematics, 1034,
1983.
[89] Natanson, L. P.: Theorie der Funktionen einer reellen Vernderlichen; Akademie Ver-
lag Berlin, 1961.
[90] Peetre, I.: Approximation of Norms, J. Approx. Theory 3 (1970), 243260.
[91] Phelps, R. R.: Convex Functions, Monotone Operators and Differentiability, Lecture
Notes in Mathematics, Vol. 1364, Springer, 1989.
[92] Rao, M. M.: Linear functionals on Orlicz Spaces, Nieuw. Arch. Wisk. 12 (1964), 77
98.
[93] Rao, M. M.: Smoothness of Orlicz spaces, Indagationes Mathematicae 27 (1965), 671
689.
[94] Rao, M. M.: Linear functionals on Orlicz Spaces: general theory, Pac. J. Math. 25
(1968), 553585.
376 Bibliography
[95] Rao, M. M., Ren, Z. D.: Theory of Orlicz Spaces, Pure and Applied Mathematics,
Marcel Dekker, New York, 1991.
[96] Rao, M. M., Ren, Z. D.: Applications of Orlicz Spaces, Pure and Applied Mathematics,
Marcel Dekker, New York, 2002.
[97] Reid, W. T.: A Matrix Differential Equation of Riccati Type, American Journal of
Mathematics LXVIII (1946).
[98] Rice, J. R.: Approximation of Functions. Vol. I and II, Addison Wesley, 1964 and 1969.
[99] Rockafellar, R. T.: Characterization of the subdifferentials of convex functions, Pac.
J. Math. 17 (1966), 497510.
[100] Rockafellar, R. T.: Convex Analysis, Princeton, 1970.
[101] Roubicek, T.: Relaxation in Optimization Theory and Variational Calculus. De Gruyter
Series in Nonlinear Analysis and Applications 4, 1997.
[102] Soardi, P.: Wavelet bases in rearrangement invariant function spaces. Proc. Amer. Math.
Soc. 125(12) (1997), 36693673.
[103] Temlyakov, M. M.: Greedy algorithms in Banach spaces, Advances in Computational
Mathematics, 14 (2001), 277292.
[104] Temlyakov, M. M.: Convergence of some greedy algorithms in Banach spaces, The
Journal of Fourier Analysis and Applications, Vol. 8(5) (2002), 489505.
[105] Temlyakov, M. M.: Nonlinear Methods of Approximation, Foundations of Computa-
tional Mathematics, 3 (2003), 33107.
[106] Tsenov, I. V.: Some questions in the theory of functions. Mat. Sbornik 28 (1951), 473
478 (Russian).
[107] Turett, B.: FenchelOrlicz spaces. Diss. Math. 181, 1980.
[108] Tychonoff, A. N.: Solution of incorrectly formulated problems and the regularization
method. Soviet Math. Dokl. 4 (1963), 10351038.
[109] Vainberg, M. M.: Variational Method and Method of monotone Operators in the Theory
of non-linear Equations. Wiley and Sons, 1973.
[110] Vller, J.: Gleichgradige Stetigkeit von Folgen monotoner Operatoren, Diplomarbeit
am Mathematischen Seminar der Universitt Kiel, 1978.
[111] Werner, D.: Funktionalanalysis, Springer, 6. Auage, 2007.
[112] Werner, H.: Vorlesungen ber Approximationstheorie. Lect. Notes in Math. 14,
Springer, 1966.
[113] Wloka J.: Funktionalanalysis und Anwendungen, W. de Gruyter, Berlin, 1971.
[114] Zaanen, A. C.: Integration, North Holland, p. 372, 1967.
[115] Zang, I.: A Smoothing-Out Technique for Min-Max-Optimization, Mathematical Pro-
gramming, pp. 6177, 1980.
Bibliography 377
[116] Zeidan, V.: Sufcient Conditions for the Generalized Problem of Bolza, Transactions
of the AMS 275 (1983), 561586.
[117] Zeidan, V.: Extended Jacobi Sufcient Conditions for Optimal Control, SIAM Control
and Optimization 22 (1984), 294301.
[118] Zeidan, V.: First and Second Order Sufcient Conditions for Optimal Control and cal-
culus of Variations, Applied Math. and Optimiz. 11 (1984), 209226.
[119] Zeidler, E.: Nonlinear Functional Analysis and its Applications III, Springer, Berlin,
Heidelberg, New York, 1985.
List of Symbols
C
0
polar of the set C
C
00
bipolar of the set C
X
dual space of X
x
n
x weak convergence of sequence (x
n
) to x in space X
S(X) unit sphere in X
U(X) unit ball in X
K(x
0
, r) open ball with center x
0
and radius r
conv(A) convex hull of set A
E
p
(S) extreme points of set S
Epi(f) epigraph of the function f
Dom(f) set where function f is nite
S
f
(r) level set of function f at level r
f(x
0
) subgradient of function f at x
0
M(f, K) set of minimal solutions of function f on set K
(f, K) minimize function f on set K
(g
1
, g
2
, K) minimize function g
1
on M(g
2
, K)
f
convex conjugate of function f

f
biconjugate of function f
g
+
concave conjugate of function g
A
characteristic function of set A
x[
A
function x restricted to set A
Young function
conjugate of
f
modular
(T, , ) measure space
L
() Orlicz space
M
() subspace of L
(): closure of step functions

C
() Orlicz class: Dom(f
)
H
() subspace of L
() contained in C
()
Orlicz sequence space

m
subspace of
: closure of step functions

c
Orlicz class: Dom(f
) in
380 List of Symbols

h
subspace of
contained in c
space of bounded sequences

| |
()
Luxemburg norm
| |
Orlicz norm
2
-condition growth condition for
2
-condition growth condition for in neighborhood of
0
2
-condition growth condition for in neighborhood of 0
-condition convexity condition for
-condition convexity condition for in neighborhood of
0
-condition convexity condition for in neighborhood of 0
X
module of smoothness for uniformly differentiable spaces X
X
module of convexity for uniformly convex spaces X
X
module of convexity for uniformly convex spaces X
x,x
module of convexity for locally uniformly convex functions
RS[a, b] space of piecewise continuous functions on [a, b]
RCS
1
[a, b]
n
space of piecewise continuously differentiable functions on [a, b] with
values in R
n
C
1
[a, b]
n
space of continuously differentiable functions on [a, b] with values in R
n
C(T) space of continuous functions on T
W
m
L
() OrliczSobolev space
Index
A
A-differentiable, 247, 250
A-gradient, 251
accessory variational problem, 334
algebraic
interior point, 72
open, 72
algorithm
closedness, 129
greedy
GA, 300
WGA, 299
Polya, 40
Amemiya
formula, 238
norm, 237
Anderson
theorem, 265
approximation
C(T), 9
Chebyshev, 10, 34, 41, 60
Haar subspace, 12
L
1
, 41
L
p
, 34
linear, 7
Luxemburg norm, 8, 26, 29
modular, 8, 29
strict, 21, 34, 37
approximatively compact, 262
Armijo rule, 121
Ascoli
formula, 108, 259
B
B-convex, 247, 251
BanachSteinhaus
theorem, 141, 364
basis
boundedly complete, 203
Schauder, 236
unconditional, 236
Bernstein
error estimate, 18
theorem, 18
BFGS method, 127
biconjugate function, 103
bipolar, 248
bipolar theorem, 249
Borels -algebra, 9
bounded
convex function, 256
level set, 8
linearly, 73
brachistochrone, 312
Brouwer
xed point theorem, 167
Browder and Minty
theorem, 169
C
C
() Orlicz class, 193

C
(1)
[a, b] norm, 337
Carathodory
equivalent problem, x, 309
minimale, 325
theorem, 11, 75
characteristic function
Luxemburg norm, 222
Orlicz norm, 222
characterization
best linear approximation, 109
characterization theorem
Chebyshev approximation, 59
convex optimization, 10, 85
linear approximation, 28
modular approximation, 8
Chebyshev
alternant, 14, 16
approximation, 15, 36, 41, 60, 166
discrete, 20
in C(T), 10
382 Index
polynomial, 16
theorem, 15
closedness
algorithmic, 129
component-wise convex mapping, 152
concave conjugate function, 105
cone condition, 51
conjugate function, 102
conjugate point, 323
continuity
convex function in R
n
, 88
convex functions, 86
derivative, 91
derivative of convex functions, 90
Gteaux derivative, 93, 267
Luxemburg norm
on L
, 42
modular
on L
, 42
variational functional, 337
continuous convergence, 134, 347
inheritance, 135
pointwise convergence
convergence
approximate solutions, 41
continuous, 134
epigraphs, 131
greedy algorithm
in Orlicz spaces, 302, 305
uniformly differentiable spaces, 302
greedy algorithm in E-space, 302
in measure, 220
in modular, 195, 208
Kuratowski, 59
Q-superlinear, 116
quadratic, 116
Ritz method, 298
topological, 130
convergence estimate, 45, 49
L
1
-norm
on L
(), 57
maximum norm
discrete, 49
on C
(T), 54
convex
body, 74
combination, 72
cone, 147
function, 76
hull, 75
operator, 147
set, 72
n
continuity, 88
differentiability, 88
convex optimization
linear constraints, 174
convex-concave
Lagrangian, 363
convexication
Lagrangian, 322, 330, 343
convexity
strict, 276
convexity module, 253
criterion of Kolmogoroff, 10
critical points, 21
D
2
-condition, 197
0
2
-condition, 198
2
-condition, 197
-convex, 288
0
-convex, 288
-convex, 288
de la Valle-Poussin
set, 58
theorem I, 12
theorem II, 14
demi-continuity, 92
derivative
Gteaux derivative, 93
descent function, 123
descent property, 120
dictionary, 299
Dido problem
parametric treatment, 339
difference quotient
monotonicity, 80
differentiability
n
, 88
Index 383
Frchet, 87, 89
Gteaux, 82, 88
partial, 88
Dini
theorem, 133
directional derivative
left-sided, 80
maximum norm, 133
right-sided, 80
discrete L
-approximation, 20
discrete Chebyshev approximation, 20
dual convex cone, 111
dual space
of
, 233
of L
(), 232
of M
(), 227
of M
(), 231
duality
modular, 218
duality theorem
linear approximation theory, 108
DuboisReymond form, 319
E
E-space, 262, 294
Eidelheit
theorem, 97
entropy function, 162
epigraph, 76
convergence, 131
equation system
linear, 12
non-linear, 9
equicontinuity, 135
in Banach spaces, 139
equivalence
Orlicz and Luxemburg norm, 227
equivalent convex problem, 316
equivalent variational problem, 352
EulerLagrange equation, 313, 341
existence proof
principle, 109
extremal, 314, 342
extremaloid, 314
admissible, 314
extreme point
approximation, 21
convex set, 74
universal, 21
extreme set
convex set, 96
F
f
modular, 188
Fan and Glicksberg, 262
Fejr contraction, 170
Fenchel
theorem, 106
FenchelMoreau
theorem, 103, 178
at convexity, 75
at point, 75
formula of Ascoli, 108
Frchet derivative, 87
Frchet differentiability, 87, 93, 258
2
condition, 274
Luxemburg norm, 267, 274
modular, 267
Orlicz norm, 278
reexivity, 274
Frobenius norm, 119
function
convex, 76
piecewise continuous, 310
piecewise continuously differentiable,
310
functional
Minkowski, 79
positive homogeneous, 78
subadditive, 78
sublinear, 78
fundamental theorem, 321
JacobiWeierstrass, 325
G
continuity, 267
Luxemburg norm, 267
modular, 267
demi-continuity, 93
hemi-continuity, 91
norm-bounded, 92
Gteaux differentiability, 82
global convergence, 120
384 Index
global optimal path, 345
gradient method, 120
generalized, 120
Grams matrix, 27, 32
greedy algorithms, 299
H
H
() subspace, 193
Hlders inequality, 215
Haar subspace, 15
HahnBanach
theorem, 95
half-space, 95
hemi-continuity, 91, 169
Hessian, 26
regularity, 26
hull
convex, 75
hyperplane, 95
I
interpolation problem, 12
isometrically isomorphic, 204
isoperimetric problem, 339
J
Jackson
alternative, 16, 19
theorem, 16
uniqueness theorem, 17
Jacobi equation
canonical form, 322
Jacobian, 28
regularity, 28
Jensens
inequality, 77
integral inequality, 220
K
K
()
(x, r) open ball in L
(), 211
KadecKlee property, 263, 301
Karlovitz
method, 29
Kolmogoroff
criterion, 10
Krein condition, 64
KreinMilman
theorem, 99
Kuratowski convergence, 59, 131
level sets, 68, 146
L
L
(), 7
L
(), 188
L
p
(), 188
L
() Orlicz-space, 185
L
()-norm
Polya algorithm, 41
l
p
norm, 27
sequence space, 185
sequence space, 192, 195, 202204,

274
L
1
-approximation, 365
L
2
-approximation
weighted, 29
L
-approximation, 17
discrete, 20
in C[a, b], 12
linear, 28
L
p
-approximation, 161
L
1
-norm
Polya algorithm, 44
Lagrange
duality, 114
multiplier, 112
Lagrangian
componentwise convex, 352
pointwise minimization, 310
supplemented, 342
left-sided directional derivative, 80
LegendreRiccati
condition, 321
equation, 322
Leibnizs sector formula, 339
lemma
Shmulian, 263
Zaanen, 199
Lindenstrauss
theorem, 282
LindenstraussTsafriri
theorem, 203
linear supplement, 313
linearly bounded, 73
Index 385
Lipschitz continuity
uniform, 150
local uniform convexity, 294
Luxemburg norm, 278
Orlicz norm, 277
locally uniformly convex
function, 254
inheritance, 260
norm, 255
lower semi-continuity, 86
lower semi-continuous convergence, 129,
159
lower semi-equicontinuous, 130
LP problem, 173
Lusin
theorem, 235
Luxemburg norm, 7, 17, 36, 166, 185
approximation, 8
equivalence, 190
Frchet differentiability, 274
local uniform convexity, 278, 279
monotonicity, 186
strict convexity, 24
uniform convexity, 289
M
M
()
closure of step functions, 193
mapping
C-convex, 149
maximum norm, 10, 35, 59
directional derivative, 133
Mazur
theorem, 95
MazurSchauder
theorem, 110
measure space, 7, 185
-nite, 196, 224
essentially not purely atomic, 274
minimal solution, 8
characterization, 8
minimal time problem, 365
Minkowski
functional, 7, 79, 95, 185
inequality, 79
theorem, 74, 97
modied gradient method, 33
modular, 7, 188
approximation, 8
bounded, 212
continuity, 42
duality, 218
lower semi-continuous, 220
modular approximation
Newton method, 117
module of smoothness, 282
monotone
operators, 166
monotone convergence, 130
stability theorem, 10, 132
monotone regression, 354, 361
monotonicity
difference quotient, 8, 80
Luxemburg norm, 186
mapping, 84
subdifferential, 100, 294
MoreauPschenitschnii
theorem, 100, 176
N
natural cone, 148
necessary conditions
variational calculus, 333
necessary optimality condition
Jacobi, 335
Newton method, 115
damped, 122
matrix-free, 128
Newton-like, 116
non-expansive, 171
non-linear equation, 115
non-linear equation system, 9
normal cone, 148
O
operator
convex, 147
optimal path, 336
optimality condition
necessary, 85
sufcient, 85
386 Index
optimization
semi-innite, 64
order, 148
Orlicz norm, 215, 236
Frchet differentiability, 278
local uniform convexity, 277
Orlicz space, 185
approximation, 7
completeness, 187
separability, 234
OrliczSobolev space, 352
OS method, 126
P
Phelps
theorem, 93
Luxemburg norms, 36, 39, 43
Young functions, 36, 39
Lagrangian, 310
polar, 247
Polya algorithm, 34
algorithmic scheme, 40
discrete Chebyshev approximation, 35
L
()-norm, 41
L
1
-norm, 44
numerical execution, 40
regularization, 166
robust regularization, 49
strict approximation, 37
Pontryagins maximum principle, x, 309
PowellWolfe rule, 122
principle
existence proofs, 109
uniform boundedness, 137
projection, 173
proper
Q
quadratic supplement, 349
quasi-non-expansive, 170
quotient rule, 323
R
RadonNikodym
theorem, 229
reexivity, 109
of
, 234
of L
(), 234
regular extremaloid, 317
regularity
Jacobian matrix, 65
regularization
Polya algorithm
robust, 49
robust, 47, 165
Tikhonov, 158, 293, 294, 357
representation theorem, 227
Riccatis differential equation, 351
Riccatis equation, 321
right-sided directional derivative, 10, 80
geometry, 81
sublinearity, 81
Ritz
method, 298
convergence, 298
robust statistics, 9
robustness, 45
S
secant equation, 118
secant method
angle check, 128
minimization, 126
rapid convergence, 118
secant methods, 118
second stage
differentiation w.r.t. parameter, 160
stability
epsilon-solutions, 164
self-conjugate, 322
semi-innite optimization, 64, 147
Lagrange method, 64
semi-norm, 78
separability, 234
separable, 202
separation theorem
Eidelheit, 97, 106
Mazur, 95
strict, 11, 98
Index 387
set
convex, 72
Shmulian
lemma, 263
Singer
theorem, 109
SJN method, 124
Slater condition, 146
smoothness module, 305
smoothness of solutions, 316
Sobolev norm, 329
stability
continuous convergence, 134
epigraph-convergence, 145
of the conjugate Young functions, 184
stability principle, 35, 42
stability theorem
convex optimization, 35, 142
convex optimization in R
n
, 143
monotone convergence, 10, 132
variational problems, 338
step functions, 7
step size rule, 120
stop criterion, 41
norm of gradient, 45, 128
strict approximation, 21, 23
strict convexity
dual space, 277
strong convexity, 325
strong LegendreClebsch condition, 317
strong local minima, 325, 330
theorem, 331
strong minimum, 252, 255
characterization, 260
strong solvability, 252, 255, 262, 329
subdifferential, 99
subgradient, 99
characterization theorem, 101
existence, 99
subgradient inequality, 99
sublinear functional, 78
sufcient condition
solution of variational problem, 311
supplement method, 309
supplement potential, 311
linear, 311
quadratic, 321, 330
supplemented Lagrangian, 312
support functional, 248
supremum
lower semi-continuous functions, 86
system of equations
for coefcients, 26, 28
non-linear, 27
T
Temlyakov, 300
theorem
Anderson, 265
BanachSteinhaus, 42
Bernstein, 18
Brouwer, 167
Browder and Minty, 169
Carathodory, 11, 75
Chebyshev, 15
de la Valle-Poussin I, 12, 58
de la Valle-Poussin II, 14
Dini, 133
EberleinShmulian, 262
Eidelheit, 97
Fenchel, 106
FenchelMoreau, 103
HahnBanach, 95
Jackson, 16
James, 263
KreinMilman, 99
Lagrange duality, 114, 248
Lagrange multiplier, 113
Lindenstrauss, 282
LindenstraussTsafriri, 203
Lusin, 235
Mazur, 95
MazurSchauder, 110
Minkowski, 74, 91, 97
monotone convergence, 8
MoreauPschenitschnii, 100
Phelps, 93
RadonNikodym, 229
Singer, 109
Turett, 204
388 Index
uniform boundedness
component-wise convex mappings,
152
convex operators, 149
Tikhonov
regularization, 294, 357
Tikhonov regularization, 293
translation invariance
Leibnizs formula, 339
transversality condition, 354
Turett
theorem, 204
two-norm theorem, 188
two-stage optimization, 157
U
uniform
convexity, 281
uniform boundedness, 137
convex operators, 149
Luxemburg norm, 289
Orlicz norm, 283
uniform differentiability, 305
Luxemburg norm, 285
uniform Lipschitz continuity, 150
uniform strong convexity
Lagrangian, 328
variational functional, 329
uniformly convex, 359
uniformly normal, 148
uniqueness theorem
Jackson, 17
update formula, 118
Broyden, 119
upper semi-continuity, 86
upper semi-continuously convergent, 129
V
variational inequality, 166, 172
variational problem, 310, 337
W
weak Chebyshev greedy algorithm, 299
weak local minimal solution, 321
weakly bounded, 140
weakly closed, 98, 249
weakly lower semi-continuous, 104
WeierstrassErdmann condition, 314
weighted L
2
-approximation, 29
Y
Young function, 7, 39, 175
conjugate, 178
-convex, 288
denite, 17
duality, 183
differentiability and strict convex-
ity, 181
niteness of the conjugate, 184
generalized inverse, 176, 180
stability of the conjugates, 184
subdifferential, 175
Youngs equality, 102, 104
Youngs inequality, 102, 178
Z
Zaanen
lemma, 199
zeros
change of sign, 14
difference function, 15
Haar subspace, 13

Peter Kosmol, Dieter Muller-Wichards-Optimization in Function Spaces - With Stability Considerations in Orlicz Spaces (De Gruyter Series in Nonline

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Peter Kosmol, Dieter Muller-Wichards-Optimization in Function Spaces - With Stability Considerations in Orlicz Spaces (De Gruyter Series in Nonline

Uploaded by

Copyright:

Available Formats

De Gruyter Series in Nonlinear Analysis

for general measures and arbitrary

. It turns out that M

contains a closed subspace isomorphic to

, in the case of a nite not purely

, provided that is the conjugate of the Young function . An

, we show that, if and are nite, and

is smooth if and only if

is Frchet differentiable. The link between Frchet differentiability

Greedy algorithms have indeed drawn a growing attention and experienced a

-approximation for Finite Young Functions . . . . . 24

2 Chapter 1 Approximation in Orlicz Spaces

there is > 0 with

() the closed subspace spanned by the step func-

are bounded, because let f

(). We look for a point in a xed subset K of L

(), for which the

or the Luxemburg norm | |

() V , then the element v

(x ) on V , if for all v V we have

be the set of extreme values z of the difference function.

and K := x V . According to Theorem 5.2.4 and the

v(t) < 0 for t E

(z) := t T [ z(t) = |z|

for j = 1, . . . , k it follows that

. As V is a subspace, it follows that

(t) > 0, contradicting the

-approximations in C[a, b].

is a Haar subspace of dimension n on every interval [a, b].

for i 1, . . . , n + 1 we obtain using the continuity of x v

According to Jackson an alternative holds for the L

approximation and let Z be the set of zeros of xv

approximation under the following condition:

approximations of x w.r.t. V , then also v

as well as the Luxemburg norm | |

are real constants, thus

-approximation for Finite Young Functions

to be determined is represented as a linear combination

with a given basis v

and the Luxemburg norm we attempt a unied treatment. By z(v) we denote

is the uniquely determined minimal

(see Corollary 1.2.3).

+ for k sufciently large.

the strict approximation

) be a convergent subsequence and

) in the maximum norm.

such that for all k

, such that for all k

) of the bounded sequence v

so that the total sequence (v

. If we consider the corresponding Luxemburg norms |x|

or C(T). The theorem of BanachSteinhaus 5.3.17 for convex functions will

< for k sufciently

() for all k N. Further-

+ for k sufciently large. On the other hand there is a n

for k sufciently large.

(), i.e. there is a M > 0 such that [x(t)[

() for all k N. As stated above, the step functions are dense

(s) for [s[ ,= 1. Let K be a nite-dimensional subset of

(T), then there is a sequence (

() and let be chosen such that

(t) := inf > 0 [ ((t +B) T) for 0 < (T) and t T,

(x) := infL > 0 [ [x(t) x(t