You are on page 1of 53

Convex Duality and Financial

Mathematics Peter Carr


Visit to download the full and correct content document:
https://textbookfull.com/product/convex-duality-and-financial-mathematics-peter-carr/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Financial mathematics 1st Edition Mishura

https://textbookfull.com/product/financial-mathematics-1st-
edition-mishura/

Lectures on Convex Optimization Yurii Nesterov

https://textbookfull.com/product/lectures-on-convex-optimization-
yurii-nesterov/

Financial mathematics for actuaries Second Edition Chan

https://textbookfull.com/product/financial-mathematics-for-
actuaries-second-edition-chan/

Mathematics Affect and Learning Middle School Students


Beliefs and Attitudes About Mathematics Education Peter
Grootenboer

https://textbookfull.com/product/mathematics-affect-and-learning-
middle-school-students-beliefs-and-attitudes-about-mathematics-
education-peter-grootenboer/
Handbook of Financial Risk Management Chapman and Hall
CRC Financial Mathematics Series 1st Edition Thierry
Roncalli

https://textbookfull.com/product/handbook-of-financial-risk-
management-chapman-and-hall-crc-financial-mathematics-series-1st-
edition-thierry-roncalli/

Mathematics of the Financial Markets Financial


Instruments and Derivatives Modelling Valuation and
Risk Issues 1st Edition Alain Ruttiens

https://textbookfull.com/product/mathematics-of-the-financial-
markets-financial-instruments-and-derivatives-modelling-
valuation-and-risk-issues-1st-edition-alain-ruttiens/

Excursions in Modern Mathematics 9th Edition Peter


Tannenbaum

https://textbookfull.com/product/excursions-in-modern-
mathematics-9th-edition-peter-tannenbaum/

Financial management for decision makers Ninth Edition


Peter Atrill

https://textbookfull.com/product/financial-management-for-
decision-makers-ninth-edition-peter-atrill/

Financial Mathematics for Actuaries 2nd Edition Wai-Sum


Chan

https://textbookfull.com/product/financial-mathematics-for-
actuaries-2nd-edition-wai-sum-chan/
SPRINGER BRIEFS IN MATHEMATICS

Peter Carr · Qiji Jim Zhu

Convex Duality
and Financial
Mathematics

123
SpringerBriefs in Mathematics

Series Editors
Nicola Bellomo
Michele Benzi
Palle Jorgensen
Tatsien Li
Roderick Melnik
Otmar Scherzer
Benjamin Steinberg
Lothar Reichel
Yuri Tschinkel
George Yin
Ping Zhang

SpringerBriefs in Mathematics showcases expositions in all areas of mathematics


and applied mathematics. Manuscripts presenting new results or a single new result
in a classical field, new field, or an emerging topic, applications, or bridges between
new results and already published works, are encouraged. The series is intended for
mathematicians and applied mathematicians.

More information about this series at http://www.springer.com/series/10030


Peter Carr • Qiji Jim Zhu

Convex Duality and


Financial Mathematics

123
Peter Carr Qiji Jim Zhu
Department of Finance and Risk Engineering Department of Mathematics
Tandon School of Engineering Western Michigan University
New York University Kalamazoo, MI, USA
New York, NY, USA

ISSN 2191-8198 ISSN 2191-8201 (electronic)


SpringerBriefs in Mathematics
ISBN 978-3-319-92491-5 ISBN 978-3-319-92492-2 (eBook)
https://doi.org/10.1007/978-3-319-92492-2

Library of Congress Control Number: 2018946786

Mathematics Subject Classification: 26B25, 49N15, 52A41, 60J60, 90C25, 91B16, 91B25, 91B26,
91B30, 91G10, 91G20

© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Carol and Olivia
To Lilly and Charles.
And in memory of Jonathan Borwein
(1951–2016) with respect.
Preface

Convex duality plays an essential role in many important financial problems.


For example, it arises both in the minimization of convex risk measures and in
the maximization of concave utility functions. Together with generalized convex
duality, they also appear when an optimization is not immediately apparent, for
instance in implementing dynamic hedging of contingent claims. Recognizing the
role of convex duality in financial problems is crucial for several reasons. First,
considering the primal and dual problem together gives the financial modeler the
option to tackle the more accessible problem first. Usually, knowledge of the
solution of one helps in solving the other. Moreover, the solution to the dual problem
can usually be given a financial interpretation. As a result, the dual problem often
illuminates an alternative perspective, which is not easily achieved by examining the
primal problem in isolation. When flipping from the primal to the dual, a surprise
insight typically awaits, irrespective of past experience. Finally, as an added benefit,
the primal and the dual can often be paired together to provide better numerical
solutions than when either side is considered in isolation.
The goal of this book is to provide a concise introduction to this growing research
field. Our target audience is graduate students and researchers in related areas. We
begin in Chapter 1 with a quick introduction of convex duality and related tools.
We emphasize the relationship between convex duality and the Lagrange multiplier
rule for constrained optimization problems. We then give a quick overview of the
intrinsic duality relationship in several diverse financial problems.
In Chapter 2, we consider the simplest possible financial market model. In
particular, we consider a one-period economy with a finite number of possible states.
Using this simple financial market model, we showcase convex duality in a number
of important financial problems. We begin with the Markowitz portfolio theory,
which involves a particularly simple convex programming problem: optimizing a
quadratic function with linear constraints. Duality plays two important roles in
Markowitz portfolio theory. First, while the primal problem may involve hundreds
or even thousands of variables representing the risky assets potentially included in

vii
viii Preface

the portfolio, the dual problem has only two variables related to the two constraints
on the initial endowment and the expected return. In fact, the key observation of
Markowitz is that one can evaluate the performance of a portfolio in the dual space
using the variance-expected return pair. Second, the duality relationship between the
primal Markowitz portfolio problem and its dual helps us to understand that the set
of optimal portfolios is an affine set, which leads to the important two-fund theorem.
The core methodology of optimizing a quadratic function with linear constraints was
also used in the capital asset pricing model, which leads to the widely used Sharpe
ratio. Duality also plays a crucial role in this problem.
Next, we consider portfolio optimization from the perspective of maximizing
expected utility. There has been a very long history of using utility functions in
economics. In financial problems, utility functions are increasing concave functions
of wealth. The concavity of the utility function captures the risk aversion of an
investor. Arrow and Pratt introduced widely used measures of the level of risk
aversion. It turns out that there is a precise way of using generalized convexity to
characterize Pratt–Arrow risk aversion. This application illustrates the relevance of
generalized convexity in dealing with financial problems. It is even more interesting
to consider the dual of the expected utility maximization problem. It turns out
that in the absence of arbitrage, solutions to the dual problem are in essence
the equivalent martingale measures (also called risk-neutral probabilities), which
are widely used in pricing financial derivatives. Considering the expected utility
maximization problem along with its dual leads us to rediscover the fundamental
theorem of asset pricing. An added benefit of this alternative approach is that
martingale measures can be related to the risk aversion of agents in the market.
The last application that we cover in Chapter 2 concerns the dual representation
of coherent risk measures. Coherent risk measures are motivated by the common
regulatory practice of assigning each position in a risky asset with the appropriate
amount of cash reserves. Hence, they are widely used to analyze risks. Mathemat-
ically, a coherent risk measure is characterized by a sublinear function: a convex
function with positive homogeneity. It is well known that the dual of a sublinear
function is an indicator function. Thus, using dual representation, a coherent risk
measure is just the support function of a closed convex set. Financially, we can view
the generating set of a coherent risk measure as the probabilities assigned to risky
scenarios in a stress test. Duality also generates numerical methods for calculating
some important coherent risk measures such as the conditional value at risk.
We expand our discussion to a more general multiperiod financial market model
in Chapter 3. This more general setting allows us to model dynamic trading. The
added complexity in dealing with a multiperiod model mainly involves capturing
the increase in information using an information structure. After laying out the
multiperiod financial market model, we show that the fundamental theorem of asset
pricing also arises in a multiperiod financial market model. After that we also dis-
cuss two new topics: super (sub) hedging and conic finance. In general, the absence
of arbitrage leads to multiple (usually infinitely many) pricing martingale measures
Preface ix

in an incomplete market. Thus, the no arbitrage principle usually determines a price


range for a contingent claim with upper and lower bounds, which are given by the
supremum and the infimum of the expectation of the payoff under the martingale
measures, respectively. If a market price falls outside of these bounds, then an
arbitrage opportunity occurs. It turns out that the dual solution to the optimization
problem of finding the upper or lower no arbitrage bounds provides a trading
strategy that one can use to take advantage of such an arbitrage opportunity. Conic
finance is used to describe financial markets for which the absolute value of the
price depends on whether one is buying or selling. In other words, conic finance
describes realistic financial markets with a strictly positive bid-ask spread. In such
a model, the cash flows that can be achieved from implementing acceptable trading
strategies form a convex cone. This observation provides the rationale for the name
conic finance. Despite the added complication of dealing with a conic constraint,
we show that most of the duality relationships that are observed under zero bid–ask
spread still prevail when the spread is positive.
We then move to continuous-time financial models in Chapter 4. The most
noteworthy duality relationship developed in this chapter is the observation that the
classical Black-Scholes formula for pricing a contingent claim with a convex payoff
is, in fact, a Fenchel-Legendre transform. We show that the function describing
cash borrowings while delta hedging a short position in a contingent claim is just
the Fenchel conjugate of the contingent claim pricing function. The flip side is that
the contingent claim pricing function can itself be viewed as a Fenchel conjugate
of the function describing these cash borrowings. This provides a new perspective
on the convex function linking the price of the contingent claim to the underlying
spot price. With the availability of many tradable contingent claims such as those
embedded in ETFs, the ability to dynamically hedge a contingent claim with other
contingent claims is increasingly becoming a financial reality. Interestingly, when
using contingent claims as hedging instruments, one discovers a similar duality
relationship between the contingent claim pricing function and the cash borrowings
function in terms of generalized convexity. Many useful applications are also
discussed in this chapter. We examine the convexity and generalized convexity of
the Bachelier and Black-Scholes option pricing formulae with respect to volatility
as well. Generalizations of these properties might be useful in dealing with financial
products related to volatility and be a potentially fruitful future research direction.
The material in this book grew out of slides used to teach a joint doctoral
seminar at New York University’s Courant Institute in the fall of 2015. Part of the
materials has also been used previously for graduate topic courses on optimization
and modeling at Western Michigan University. We thank our colleagues at both
NYU and WMU for providing us with supportive research environments. Professor
Robert Kohn helped to arrange us becoming neighbors, which facilitated our
collaboration in no small part. Conversations with Professors Marco Avellaneda,
Jonathan Goodman, and Fang-Hua Lin have been most helpful. We are also indebted
to the participants of these courses for many stimulating discussions. In particular,
x Preface

we thank Monty Essid, Tom Li, Matthew Foreman, Sanjay Karanth, Jay Treiman,
Mehdi Vazifadan, and Guolin Yu whose detailed comments on various parts of our
lecture notes have been incorporated into the text.

New York, NY, USA Peter Carr


Kalamazoo, MI, USA Qiji Jim Zhu
April, 2017
Contents

1 Convex Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Convex Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Convex Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Subdifferential and Lagrange Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Nonemptiness of Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Role in Convex Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Fenchel Conjugate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 The Fenchel Conjugate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 The Fenchel–Young Inequality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Graphic Illustration and Generalizations . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Convex Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.1 Rockafellar Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.2 Fenchel Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.3 Lagrange Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.4 Generalized Fenchel–Young Inequality . . . . . . . . . . . . . . . . . . . . . . . 23
1.5 Generalized Convexity, Conjugacy and Duality . . . . . . . . . . . . . . . . . . . . . . . 28
2 Financial Models in One Period Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1 Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1.1 Markowitz Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.2 Capital Asset Pricing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.1.3 Sharpe Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2.1 Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2.2 Measuring Risk Aversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.3 Growth Optimal Portfolio Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.4 Efficiency Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

xi
xii Contents

2.3 Fundamental Theorem of Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55


2.3.1 Fundamental Theorem of Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . 57
2.3.2 Pricing Contingent Claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.3.3 Complete Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.3.4 Use Linear Programming Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.4 Risk Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4.1 Coherent Risk Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4.2 Equivalent Characterization of Coherent Risk Measures . . . . . 69
2.4.3 Good Deal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.4.4 Several Commonly Used Risk Measures . . . . . . . . . . . . . . . . . . . . . . 77
3 Finite Period Financial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.1.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.1.2 A General Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.2 Arbitrage and Admissible Trading Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.3 Fundamental Theorem of Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.3.1 Fundamental Theorem of Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . 89
3.3.2 Relationship Between Dual of Portfolio Utility
Maximization, Lagrange Multiplier and Martingale
Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.3.3 Pricing Contingent Claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.4 Complete Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4 Hedging and Super Hedging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.4.1 Super- and Sub-hedging Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.4.2 Towards a Complete Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.4.3 Incomplete Market Arise from Complete Markets . . . . . . . . . . . . 97
3.5 Conic Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.5.1 Modeling Financial Markets with an Ask-Bid Spread . . . . . . . . 99
3.5.2 Characterization of No Arbitrage by Utility Optimization . . . 101
3.5.3 Dual Characterization of No Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . 102
3.5.4 Pricing and Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4 Continuous Financial Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.1 Continuous Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.1.1 Brownian Motion and Martingale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.1.2 The Itô Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.1.3 Girsanov Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.2 Bachelier and Black–Scholes Formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.2.1 Pricing Contingent Claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.2.2 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.2.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.3 Duality and Delta Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.1 Delta Hedging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Contents xiii

4.3.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124


4.3.3 Time Reversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4 Generalized Duality and Hedging with Contingent Claims . . . . . . . . . . . 128
4.4.1 Preservation of Generalized Convexity in the Value
Function of a Contingent Claim. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.4.2 Determining the Hedging Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.3 Hedging with p-Multiple ETF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.4.4 Reducing the Volatility of the Hedging Process . . . . . . . . . . . . . . . 138
4.4.5 The Volatility Trade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Chapter 1
Convex Duality

Abstract We present a concise description of the convex duality theory in this


chapter. The goal is to lay a foundation for later application in various financial
problems rather than to be comprehensive. We emphasize the role of the subdiffer-
ential of the value function of a convex programming problem. It is both the set of
Lagrange multiplier and the set of solutions to the dual problem. These relationships
provide much convenience in financial applications. We also discuss generalized
convexity, conjugacy, and duality.

1.1 Convex Sets and Functions

1.1.1 Definitions

Definition 1.1.1 (Convex Sets and Functions) Let X be a Banach space. We say
that a subset C of X is a convex set if, for any x, y ∈ C and any λ ∈ [0, 1],
λx + (1 − λ)y ∈ C. We say an extended-valued function f : X → R ∪ {+∞} is a
convex function if its domain, dom f := {x ∈ X | f (x) < ∞}, is convex and for
any x, y ∈ dom f and any λ ∈ [0, 1], one has

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).

We call f : X → [−∞, +∞) a concave function if −f is convex.


In some sense convex functions are the simplest functions next to linear
functions. Convex sets and functions are intrinsically related. For example, it is
easy to verify that C is a convex set if and only if ιC (x) := 0 if x ∈ C and
ιC (x) := +∞ otherwise, its indicator function, is a convex function. On the other
hand, if f is a convex function, then the epigraph of f , epi f := {(x, r) | f (x) ≤ r}
and f −1 ((−∞, a]) := {x | f (x) ∈ (−∞, a]}, a ∈ R are convex sets. In fact,
we can check that the convexity of epi f characterizes that of f . This geometric

© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2018 1


P. Carr, Q. J. Zhu, Convex Duality and Financial Mathematics,
SpringerBriefs in Mathematics, https://doi.org/10.1007/978-3-319-92492-2_1
2 1 Convex Duality

characterization is very useful in many situations. For instance, it is easy to see that
the intersection of a class of convex sets is convex. Now let fα be a class of convex
functions we can see that

epi sup fα = ∩α epi fα


α

and, thus, supα fα is convex. In particular, the support function of a set C ⊂ X


defined on the dual space X∗ by

σC (x ∗ ) = σ (C; x ∗ ) := sup{ x, x ∗ | x ∈ C} (1.1.1)

is always convex. Note that allowing the extended value +∞ in the definition of
convex function is important in establishing those relations.
An important property of convex functions related to applications in economics
and finance is the Jensen inequality.
Proposition 1.1.2 (Jensen’s Inequality) Let f be a convex function. Then, for any
random variable X on a finite probability space,

f (E[X]) ≤ E[f (X)],

where E[X] stands for the expectation of X.


When X has only finite states this result directly follows from the definition. The
general result can be proven by approximation.
A special kind of convex set-convex cone is very useful.
Definition 1.1.3 Let X be a finite dimensional Banach space. We say K ⊂ X is a
convex cone if for any x, y ∈ K and any α, β ≥ 0, αx + βy ∈ K. Moreover, we say
K is pointed if K ∩ (−K) = {0}.
A pointed convex cone K induces a partial order ≤K by defining x ≤K y if and
only if y−x ∈ K. We can easily check that ≤K is reflexive (x ≤K x), antisymmetric
(x ≤K y and y ≤K x implies x = y), and transitive (x ≤K y and y ≤K z implies
that x ≤K z). The definition of convexity can easily be extended to mappings whose
image space has such a partial order.
Definition 1.1.4 (Convex Mappings) Let X and Y be two Banach spaces. Assume
that Y has a partial order ≤K generated by the pointed convex cone K ⊂ Y . We say
that a mapping f : X → Y is K-convex provided that, for any x, y ∈ dom f and
any λ ∈ [0, 1], one has

f (λx + (1 − λ)y) ≤K λf (x) + (1 − λ)f (y).


1.1 Convex Sets and Functions 3

1.1.2 Convex Programming

We will often encounter various forms of the general convex programming problems
below in financial applications in subsequent chapters. Let X, Y , and Z be finite
dimensional Banach spaces. Assume that Y has a partial order ≤K generated by the
pointed convex cone K. We will use X∗ , Y ∗ , and Z ∗ to denote the dual spaces of
X, Y , and Z, respectively, and denote the polar cone of K by

K + := {y ∗ ∈ Y ∗ : y ∗ , y ≥ 0 for all y ∈ K}.

Consider the following class of constrained optimization problems

P (y, z) Minimize f (x) (1.1.2)


Subject to g(x) ≤K y,
h(x) = z,
x ∈ C,

where C is a closed set, f : X → R is lower semicontinuous, g : X → Y is lower


semicontinuous with respect to ≤K , and h : X → Z is continuous. We will use
v(y, z) to represent the optimal value function

v(y, z) := inf{f (x) : g(x) ≤K y, h(x) = z, x ∈ C},

which may take values ±∞ (in infeasible or unbounded below cases), and S(y, z)
the (possibly empty) solution set of problem P (y, z).
A concrete example is

Minimize f (x) (1.1.3)


Subject to gm (x) ≤ ym , m = 1, 2, . . . , M,
hl (x) = zl , l = 1, 2, . . . , L
x ∈ C ⊂ RN ,

where C is a closed subset, f, gm : RN → R are lower semicontinuous, and hl :


RN → R are continuous. Defining vector valued function g = (g1 , g2 , . . . , gM ) and
h = (h1 , h2 , . . . , hL ), problem (1.1.3) becomes problem (1.1.2) with ≤K =≤RM ,
+
where RM + := {x ∈ R
M | x ≥ 0} is the positive orthant in RM . Beside Euclidean

spaces, for applications in this book we will often need to consider the Banach space
of random variables.
It turns out that the optimal value function of a convex programming problem is
convex.
4 1 Convex Duality

Proposition 1.1.5 (Convexity of Optimal Value Function) Suppose that in the


constrained optimization problem (1.1.2), function f is convex, mapping g is ≤K
convex, and mapping h is affine, and set C is convex. Then the optimal value function
v is convex.
Proof Consider (y i , zi ), i = 1, 2 in the domain of v and an arbitrary ε > 0. We can
find xεi feasible to the constraint of problem P (y i , zi ) such that

f (xεi ) < v(y i , zi ) + ε, i = 1, 2. (1.1.4)

Now for any λ ∈ [0, 1], we have

f (λxε1 + (1 − λ)xε2 ) ≤ λf (xε1 ) + (1 − λ)f (xε2 ) (1.1.5)


< λv(y 1 , z1 ) + (1 − λ)v(y 2 , z2 ) + ε.

It is easy to check that λxε1 + (1 − λ)xε2 is feasible for problem P (λ(y 1 , z1 ) + (1 −


λ)(y 2 , z2 )). Thus, v(λ(y 1 , z1 ) + (1 − λ)(y 2 , z2 )) ≤ f (λxε1 + (1 − λ)xε2 ). Combining
with inequality (1.1.5) and letting ε → 0 we arrive at

v(λ(y 1 , z1 ) + (1 − λ)(y 2 , z2 )) ≤ λv(y 1 , z1 ) + (1 − λ)v(y 2 , z2 ),

that is to say v is convex. 

This is a very potent result that can help us to recognize the convexity of many
other functions. For example, let C be a convex set then, dC , the distance function
to C defined by dC (z) := inf[z − c : c ∈ C] is a convex function because we can
rewrite it as the optimal value of the following special case of problem (1.1.2)

dC (z) = inf[x : x + c = z, c ∈ C].

While the value function of a convex programming problem is always convex, it


is not necessarily smooth even if all the data involved are smooth. The following is
an example
 √
− y y≥0
v(y) = inf[x : x ≤ y] =
2
+∞ y < 0.

1.2 Subdifferential and Lagrange Multiplier

Many naturally arising nonsmooth convex functions lead to the definition of


subdifferential as a replacement for the nonexisting derivative.
1.2 Subdifferential and Lagrange Multiplier 5

1.2.1 Definition

Definition 1.2.1 (Subdifferential) Let X be a finite dimensional Banach space.


The subdifferential of a lower semicontinuous function φ : X → R ∪ {+∞} at
x ∈ dom φ is defined by

∂φ(x) = {x ∗ ∈ X∗ : φ(y) − φ(x) ≥ x ∗ , y − x ∀y ∈ X}.

We define the domain of the subdifferential of φ by

dom ∂φ = {x ∈ X | ∂φ(x) = ∅}.

An element of ∂φ(x) is called a subgradient of φ at x.


Definition 1.2.2 (Normal Cone) For a closed convex set C ⊂ X, we define the
normal cone of C at x̄ ∈ C by N(C; x̄) = ∂ιC (x̄).
Sometimes we will also use the notation NC (x̄) = N (C; x̄). A useful characteriza-
tion of the normal cone is x ∗ ∈ N(C; x) if and only if, for all y ∈ C, x ∗ , y−x ≤ 0.
It is easy to verify that if f has a continuous derivative at x then ∂f (x) = {f  (x)}.
At a nondifferentiable point a convex function’s subdifferential is usually a set. Here
are a few examples.
Example 1.2.3 We can easily verify that
• ∂| · |(0) = [−1, 1].
• ∂(·)+ (0) = [0, +∞).
• ∂(·)− (0) = (−∞, 0].
In general, if  ·  is the euclidean norm on RN , ∂ · (0) = B1 (0), where B1 (0) is
the closed unit ball of RN .

1.2.2 Nonemptiness of Subdifferential

A natural and important question is that when we can ensure the subdifferential
is nonempty. The following Fenchel-Rockafellar theorem provides a basic form of
sufficient conditions.
Theorem 1.2.4 (Fenchel-Rockafellar Theorem on Nonemptiness of Subdiffer-
ential) Let f : X → R ∪ {+∞} be a convex function. Suppose that x̄ ∈
int(dom f ), the interior of dom f . Then the subdifferential ∂f (x̄) is nonempty.
Proof We observe that (x̄, f (x̄)) is a boundary point of the closed set epi f which
has a nonempty interior. Thus by the Hahn–Banach extension theorem there exists
6 1 Convex Duality

a supporting hyperplane of epi f at (x̄, f (x̄)) whose normal vector is (0, 0) =


(x ∗ , r) ∈ X∗ × R. Now, for any x ∈ dom f and u ≥ f (x), we have

r(u − f (x̄)) + x ∗ , x − x̄ ≥ 0. (1.2.1)

Since u ≥ f (x) is arbitrary, r ≥ 0. Moreover, if r = 0, then x̄ ∈ int dom f would


also imply x ∗ = 0, which yield a contradiction. Thus, r > 0. Letting u = f (x)
in (1.2.1) we see that −x ∗ /r ∈ ∂f (x̄). 

Remark 1.2.5 (Constraint Qualification: Relative Interior) The Fenchel-Rockafellar


Theorem is a fundamental result that we will use often in the sequel. Condition
x̄ ∈ int(dom f ) is a sufficient condition that can be improved. Notice that we
don’t need to worry about points at which f = ∞. Thus, we need only check the
condition of Theorem 1.2.4 on span(dom f ), the span of dom f . Thus, condition
x̄ ∈ int(dom f ) can be revised to x ∈ ri(dom f ) and f is lower semicontinuous,
where ri signifies the relative interior, i.e. interior points on span(dom f ).
Remark 1.2.6 (Constraint Qualification: Polyhedral Problem) Recall that a set is
polyhedral if it is the intersection of finitely many closed half-spaces. A function
is polyhedral if its epigraph is a polyhedral set. For a polyhedral function its
subdifferential is nonempty in any point of its domain (see, e.g., [7]). This sufficient
condition is very useful in dealing with linear programming problems.
The conclusion ∂f (x̄) = ∅ can be stated alternatively as there exists a linear
functional x ∗ such that f − x ∗ attains its minimum at x̄. This is a very useful
perspective on the use of variational arguments—deriving results by observing a
certain auxiliary function attains a minimum or maximum.

1.2.3 Calculus

For more complicated convex functions we need the help of a convenient calculus
for calculating or estimating its subdifferential. It turns out that the key for
developing such a calculus is to combine a decoupling mechanism with the existence
of subgradient. We summarize this idea in the following lemma.
Lemma 1.2.7 (Decoupling Lemma) Let X and Y be Banach spaces. Let the
functions f : X → R and g : Y → R be convex and let A : X → Y be a linear
transform. Suppose that f , g, and A satisfy the condition

0 ∈ ri[dom g − A dom f ]. (1.2.2)

Then there is a y ∗ ∈ Y ∗ such that for any x ∈ X and y ∈ Y ,

p ≤ [f (x) − y ∗ , Ax ] + [g(y) + y ∗ , y ], (1.2.3)


1.2 Subdifferential and Lagrange Multiplier 7

where p = infx∈X {f (x) + g(Ax)}.


Proof Define an optimal value function v : Y → [−∞, +∞] by

v(u) = inf {f (x) + g(Ax + u)}


x∈X
= inf {f (x) + g(y) : y − Ax = u.} (1.2.4)
x∈X

Proposition 1.1.5 implies that v is convex. Moreover, it is easy to check that


dom v = dom g − A dom f so that by Theorem 1.2.4 and Remark 1.2.5 the
constraint qualification condition (1.2.2) ensures that ∂v(0) = ∅. Let −y ∗ ∈ ∂v(0).
By definition we have

v(0)=p ≤ v(y−Ax)+ y ∗ , y−Ax ≤ f (x)+g(y)+ y ∗ , y − Ax . (1.2.5)

We apply the decoupling lemma of Lemma 1.2.7 to establish a sandwich


theorem.
Theorem 1.2.8 (Sandwich Theorem) Let f : X → R ∪ {+∞} and g : Y →
R ∪ {+∞} be convex functions and let A : X → Y be a linear map. Suppose that
f ≥ −g ◦ A and f , g, and A satisfy condition (1.2.2). Then there is an affine
function α : X → R of the form α(x) = A∗ y ∗ , x + r satisfying f ≥ α ≥ −g ◦ A.
Moreover, for any x̄ satisfying f (x̄) = −g ◦ A(x̄), we have −y ∗ ∈ ∂g(Ax̄).
Proof By Lemma 1.2.7 there exists y ∗ ∈ Y ∗ such that for any x ∈ X and y ∈ Y ,

0 ≤ p ≤ [f (x) − y ∗ , Ax ] + [g(y) + y ∗ , y ]. (1.2.6)

For any z ∈ X setting y = Az in (1.2.6) we have

f (x) − A∗ y ∗ , x ≥ −g(Az) − A∗ y ∗ , z . (1.2.7)

Thus,

a := inf [f (x) − A∗ y ∗ , x ] ≥ b := sup[−g(Az) − A∗ y ∗ , z ].


x∈X z∈X

Picking any r ∈ [a, b], α(x) := A∗ y ∗ , x + r is an affine function that separates f


and −g ◦ A. Finally, when f (x̄) = −g ◦ A(x̄), it follows from (1.2.6) that −y ∗ ∈
∂g(Ax̄). 
8 1 Convex Duality

We now use the tools established above to deduce calculus rules for the convex
functions. We start with a sum rule playing a role similar to the sum rule for
derivatives in calculus.
Theorem 1.2.9 (Convex Subdifferential Sum Rule) Let f : X → R ∪ {+∞}
and g : Y → R ∪ {+∞} be convex functions and let A : X → Y be a linear map.
Then at any point x in X, we have the sum rule

∂(f + g ◦ A)(x) ⊃ ∂f (x) + A∗ ∂g(Ax), (1.2.8)

with equality if condition (1.2.2) holds.


Proof Inclusion (1.2.8) is easy and left to the reader as an exercise. We prove the
reverse inclusion under condition (1.2.2). Suppose x ∗ ∈ ∂(f + g ◦ A)(x̄). Since
shifting by a constant does not change the subdifferential of a convex function, we
may assume without loss of generality that

x → f (x) + g(Ax) − x ∗ , x

attains its minimum 0 at x = x̄. By the sandwich theorem there exists an affine
function α(x) := A∗ y ∗ , x + r with −y ∗ ∈ ∂g(Ax̄) such that

f (x) − x ∗ , x ≥ α(x) ≥ −g(Ax).

Clearly equality is attained at x = x̄. It is now an easy matter to check that x ∗ +


A∗ y ∗ ∈ ∂f (x̄). 

Note that when A is the identity mapping and both f and g are differentiable
Theorem 1.2.9 recovers sum rules in calculus. The geometrical interpretation of this
is that one can find a hyperplane in X × R that separates the epigraph of f and
hypograph of −g, i.e. {(x, r) : −g(x) ≥ r}.
By applying the subdifferential sum rule to the indicator functions of two convex
sets we have parallel results for the normal cones to the intersection of convex sets.
Theorem 1.2.10 (Normals to an Intersection) Let C1 and C2 be two convex
subsets of X and let x ∈ C1 ∩ C2 . Suppose that C1 ∩ int C2 = ∅. Then

N(C1 ∩ C2 ; x) = N(C1 ; x) + N (C2 ; x).

Proof Applying the subdifferential sum rule to the indicator functions of C1


and C2 . 

The condition (1.2.2) is often referred to as a constraint qualification. Without it


the equality in the convex subdifferential sum rule may not hold.
1.2 Subdifferential and Lagrange Multiplier 9

1.2.4 Role in Convex Programming

Subdifferential plays important roles in convex programming. First for uncon-


strained convex minimization problem we have Fermat’s rule:
Proposition 1.2.11 (Fermat’s Rule) Let X be a Banach space and let f : X →
R ∪ {+∞} be a proper convex function. Then the point x̄ ∈ X is a (global)
minimizer of f if and only if the condition 0 ∈ ∂f (x̄) holds.
Proof We only need to observe that x̄ ∈ X is a minimizer of f if and only if

f (x) − f (x̄) ≥ 0 = 0, x − x̄ ,

which by definition is equivalent to 0 ∈ ∂f (x̄). 

Alternatively put, minimizers of f correspond exactly to “zeroes” of ∂f .


Consider the constrained convex optimization problem of

CP minimize f (x) (1.2.9)

subject to x ∈ C ⊂ X,

where C is a closed convex subset of X and f : X → R ∪ {+∞} is a convex lower


semicontinuous function. Combining the Fermat’s rule with the subdifferential sum
rule we derive a characterization for solutions to CP.
Theorem 1.2.12 (Pshenichnii–Rockafellar Conditions) Let C be a closed convex
subset of RN and let f : X → R ∪ {+∞} be a convex function. Suppose that 0 ∈
ri[dom f − C] and f is bounded from below on C. Then x̄ is a solution of CP if and
only if it satisfies

0 ∈ ∂f (x̄) + N(C; x̄).

Proof Apply the convex subdifferential sum rule of Theorem 1.2.9 to f + ιC at x̄.


Finally we turn to the relationship between subdifferential of optimal value


functions in convex programming and Lagrange multipliers. We shall see from
the two versions of Lagrange multiplier rules given below, the subdifferential of
the optimal value function completely characterizes the set of Lagrange multipliers
(denoted λ in these theorems).
Theorem 1.2.13 (Lagrange Multiplier Without Existence of Optimal Solution)
Let v(y, z) be the optimal value function of the constrained optimization problem
P (y, z). Then −λ ∈ ∂v(0, 0) if and only if
10 1 Convex Duality

(i) (nonnegativity) λ ∈ K + × Z ∗ ; and


(ii) (unconstrained optimum) for any x ∈ C,

f (x) + λ, (g(x), h(x)) ≥ v(0, 0).

Proof The “only if” part. Suppose that −λ ∈ ∂v(0, 0). It is easy to see that v(y, 0)
is non-increasing with respect to the partial order ≤K . Thus, for any y ∈ K,

0 ≥ v(y, 0) − v(0, 0) ≥ −λ, (y, 0)

so that λ ∈ K + × Z ∗ verifying (i). Conclusion (ii) follows from the fact that for all
x ∈ C,

f (x) + λ, (g(x), h(x)) ≥ v(g(x), h(x)) + λ, (g(x), h(x)) ≥ v(0, 0).


(1.2.10)

The “if” part. Suppose λ satisfies conditions (i) and (ii). Then we have, for any
x ∈ C, g(x) ≤K y and h(x) = z,

f (x) + λ, (y, z) ≥ f (x) + λ, (g(x), h(x)) ≥ v(0, 0). (1.2.11)

Taking the infimum of the leftmost term under the constraints x ∈ C, g(x) ≤K y
and h(x) = z, we arrive at

v(y, z) + λ, (y, z) ≥ v(0, 0). (1.2.12)

Therefore, −λ ∈ ∂v(0, 0). 

If we denote by Λ(y, z) the multipliers satisfying (i) and (ii) of Theorem 1.2.13,
then we may write the useful set equality

Λ(0, 0) = −∂v(0, 0).

The next corollary is now immediate.


Corollary 1.2.14 (Lagrange Multiplier Without Existence of Optimal Solution)
Let v(y, z) be the optimal value function of the constrained optimization problem
P (y, z). Then −λ ∈ ∂v(0, 0) if and only if
(i) (nonnegativity) λ ∈ K + × Z ∗ ;
(ii) (unconstrained optimum) for any x ∈ C, satisfying g(x) ≤K y and h(x) = z,

f (x) + λ, (y, z) ≥ v(0, 0).


1.2 Subdifferential and Lagrange Multiplier 11

When an optimal solution for the problem P (0, 0) exists, we can also derive a
so-called complementary slackness condition.
Theorem 1.2.15 (Lagrange Multiplier when Optimal Solution Exists) Let
v(y, z) be the optimal value function of the constrained optimization problem
P (y, z). Then the pair (x̄, λ) satisfies −λ ∈ ∂v(0, 0) and x̄ ∈ S(0, 0) if and only
if
(i) (nonnegativity) λ ∈ K + × Z ∗ ;
(ii) (unconstrained optimum) the function

x → f (x) + λ, (g(x), h(x))

attains its minimum over C at x̄;


(iii) (complementary slackness) λ, (g(x̄), h(x̄)) = 0.
Proof The “only if” part. Suppose that x̄ ∈ S(0, 0) and −λ ∈ ∂v(0, 0). As in the
proof of Theorem 1.2.13 we can show that λ ∈ K + × Z ∗ . By the definition of the
subdifferential and the fact that v(g(x̄), h(x̄)) = v(0, 0), we have

0 = v(g(x̄), h(x̄)) − v(0, 0) ≥ −λ, (g(x̄), h(x̄)) ≥ 0,

so that the complementary slackness condition λ, (g(x̄), h(x̄)) = 0 holds.


Observing that v(0, 0) = f (x̄) + λ, (g(x̄), h(x̄)) , the strengthened uncon-
strained optimal condition follows directly from that of Theorem 1.2.13.
The “if” part. Let λ and x̄ satisfy conditions (i), (ii), and (iii). Then, for any x ∈ C
satisfying g(x) ≤K 0 and h(x) = 0,

f (x) ≥ f (x) + λ, (g(x), h(x)) (1.2.13)


≥ f (x̄) + λ, (g(x̄), h(x̄)) = f (x̄).

That is to say x̄ ∈ S(0, 0).


Moreover, for any g(x) ≤K y, h(x) = z, f (x) + λ, (y, z) ≥ f (x) +
λ, (g(x), h(x)) . Since v(0, 0) = f (x̄), by (1.2.13) we have

f (x) + λ, (y, z) ≥ f (x̄) = v(0, 0). (1.2.14)

Taking the infimum on the left-hand side of (1.2.14) yields

v(y, z) + λ, (y, z) ≥ v(0, 0),

which is to say, −λ ∈ ∂v(0, 0). 

We can deduce from Theorems 1.2.13 and 1.2.15 that ∂v(0, 0) completely
characterizes the set of Lagrange multipliers.
12 1 Convex Duality

1.3 Fenchel Conjugate

Obtaining Lagrange multipliers by using the convex subdifferential is closely related


to convex duality theory based on the concept of conjugate functions introduced by
Fenchel.

1.3.1 The Fenchel Conjugate

The Fenchel conjugate of a function (not necessarily convex) f : X → [−∞, +∞]


is the function f ∗ : X∗ → [−∞, +∞] defined by

f ∗ (x ∗ ) := sup { x ∗ , x − f (x)}. (1.3.1)


x∈X

The operation f → f ∗ is also called a Fenchel–Legendre transform. The function


f ∗ is convex and if the domain of f is nonempty then f ∗ never takes the value
−∞. Clearly the conjugate operation is order-reversing : for functions f, g : X →
[−∞, +∞], the inequality f ≥ g implies f ∗ ≤ g ∗ .

1.3.2 The Fenchel–Young Inequality

This is an elementary but important result that relates conjugate operation with the
subgradient.
Proposition 1.3.1 (Fenchel–Young Inequality) Let f : X → R ∪ {+∞} be a
convex function. Suppose that x ∗ ∈ X∗ and x ∈ dom f . Then

f (x) + f ∗ (x ∗ ) ≥ x ∗ , x . (1.3.2)

Equality holds if and only if x ∗ ∈ ∂f (x).


Proof The inequality (1.3.2) follows directly from the definition. We have the
equality

f (x) + f ∗ (x ∗ ) = x ∗ , x ,

if and only if, for any y ∈ X,

f (x) + x ∗ , y − f (y) ≤ x ∗ , x .
1.3 Fenchel Conjugate 13

That is

f (y) − f (x) ≥ x ∗ , y − x ,

or x ∗ ∈ ∂f (x). 

Remark 1.3.2 When f is differentiable, taking derivative with respect to x in the


Fenchel equality we have x ∗ = f  (x). Then the Fenchel–Legendre transform has
the following explicit form as a function of x

f ∗ (f  (x)) = x, f  (x) − f (x).

In Chapter 4 we will see that when f is the price of a contingent claim as a


function of a forward price x, the Fenchel–Legendre transform is related to the
delta hedging. Its derivative is also relevant when we deal with dynamical hedging.
We can directly verify the following representation of the derivative of the Fenchel–
Legendre transform

Dx f ∗ (f  (x)) = Dx x, f  (x) − f  (x) = [Dx , f  (x)I ]x,

where Dx is the differential operator with respect to x, I is the identity operator, and
[A, B] = AB − BA represents the commutator of operator A and B. Symmetrically
we also have

Dx ∗ f ((f ∗ ) (x ∗ )) = [Dx ∗ , (f ∗ ) (x ∗ )I ]x ∗ .

We can consider the conjugate of f ∗ called the biconjugate of f and denoted


f ∗∗ . This is a function on X∗∗ . When X is a reflexive Banach space, i.e. X = X∗∗ it
follows from the Fenchel–Young inequality (1.3.2) that f ∗∗ ≤ f . The function f ∗∗
is the largest among all the convex function dominated by f and is called the convex
hull of f . Many important convex functions f on X = RN equal to their biconjugate
f ∗∗ . Such functions thus occur as natural pairs, f = g ∗ and f ∗ = g, where both
f and g are lsc convex functions. Table 1.1 shows some elegant examples on R.
Checking the calculation in Table 1.1 is a good exercise to get familiar with concept
of conjugate functions.
Note that the first four functions in Table 1.1 are special cases of indicator
functions on R. A more general result is:
Example 1.3.3 (Conjugate of Indicator Function) Let C be a closed convex set in
the reflexive Banach space X. Then ι∗C = σC and σC∗ = ιC . The first four lines of
Table 1.1 describe four different indicate functions and their conjugate functions.

Example 1.3.4 (Conjugate of Transform) Next let us assume that h is a lower


semicontinuous function. Then the effect of some simple transform on conjugate
is summarized in Table 1.2.
14 1 Convex Duality

Table 1.1 Conjugate pairs of convex functions f and g on R


f (x) = g ∗ (x) dom f g(y) = f ∗ (y) dom g
0 R 0 {0}
0 R+ −R+
0 [−1, 1] |y| R
0 [0,1] y+ R
|x|p /p, p > 1 R |y|q /q ( p1 + 1
q = 1) R
|x|p /p, p>1 R+ |y + |q /q ( p1 + 1
q = 1) R
−x p /p, 0 < p < 1 R+ −(−y)q /q ( p1 + 1
q = 1) − int R+
− log x int R+ −1
 − log(−y) − int R+
y log y − y (y > 0)
ex R R+
0 (y = 0)

Table 1.2 Transformed f (x) f ∗ (y)


conjugates
h(ax) (a =
 0) h∗ (y/a)
h(x + b) h∗ (y) − by
ah(x) (a > 0) ah∗ (y/a)

Combining Fenchel–Young inequality and the sandwich theorem we can show


that f ∗∗ = f for convex lower semicontinuous (lsc) function f .
Theorem 1.3.5 (Biconjugate) Let X be a finite dimensional Banach space. Then
f ∗∗ ≤ f in dom f and equality holds at point x ∈ int dom f .

Proof It is easy to check f ∗∗ ≤ f and we leave it as an exercise. For any x̄ ∈


int dom f , ∂f (x̄) = ∅. Let x ∗ ∈ ∂f (x̄). By the Fenchel–Young inequality we have

f (x̄) = x ∗ , x̄ − f ∗ (x ∗ ) ≤ sup[ y ∗ , x̄ − f ∗ (y ∗ )] = f ∗∗ (x̄) ≤ f (x̄).


y∗

1.3.3 Graphic Illustration and Generalizations


x
For increasing function φ, φ(0) = 0, f (x) = 0 φ(s)ds is convex and
 x∗
f ∗ (x ∗ ) = 0 φ −1 (t)dt. Graphs Figure 1.1 illustrate the Fenchel–Young inequality
graphically. The additional areas enclosed by the graph of φ −1 , s = x and t = x ∗
1.4 Convex Duality Theory 15

t t

x∗
x∗

φ−1 φ−1
φ φ

s s
O x O x

Fig. 1.1 Fenchel–Young inequality

Fig. 1.2 Fenchel–Young t


equality
x∗

φ−1
φ

s
O x

or that of φ, s = x and t = x ∗ beyond the area of the rectangle [0, x] × [0, x ∗ ]


generate the additional area that leads to a strict inequality. We also see that equality
holds when x ∗ = φ(x) = f  (x) and x = φ −1 (x ∗ ) = (f ∗ ) (x ∗ ) in Figure 1.2.

1.4 Convex Duality Theory

Using the Fenchel–Young inequality for each constrained optimization problem we


can write its companion dual problem. There are several different but equivalent
perspectives.
16 1 Convex Duality

1.4.1 Rockafellar Duality

We start with the Rockafellar formulation of bi-conjugate. It is very general and—as


we shall see—other perspectives can easily be written as its special cases.
Consider a two-variable function F (x, y) on X × Y where X, Y are Banach
spaces. Treating y as a parameter, consider the parameterized optimization problem

v(y) = inf F (x, y). (1.4.1)


x

Our associated primal optimization problem1 is

p = v(0) = inf F (x, 0) (1.4.2)


x∈X

and the dual problem is

d = v ∗∗ (0) = sup −F ∗ (0, −y ∗ ). (1.4.3)


y ∗ ∈Y ∗

Since v dominates v ∗∗ as the Fenchel–Young inequality establishes, we have

v(0) = p ≥ d = v ∗∗ (0).

This is called weak duality and the non-negative number p − d = v(0) − v ∗∗ (0) is
called the duality gap—which we aspire to be small or zero.
Let F (x, (y, z)) := f (x) + ιepi(g) (x, y) + ιgraph(h) (x, z). Then problem P (y, z)
in (1.1.2) becomes problem (1.4.1) with parameters (y, z). On the other hand, we
can rewrite (1.4.1) as

v(y) = inf{F (x, u) : u = y}


x

which is problem P (0, y) with x = (x, u), C = X × Y , f (x, u) = F (x, u),


h(x, u) = u and g(x, u) = 0. So where we start is a matter of taste and
predisposition.
Theorem 1.4.1 (Duality and Lagrange Multipliers) The followings are
equivalent:
(i) the primal problem has a Lagrange multiplier λ.
(ii) there is no duality gap, i.e. d = p is finite and the dual problem has solution −λ.

1 Theuse of the term “primal” is much more recent than the term “dual” and was suggested by
George Dantzig’s father Tobias when linear programming was being developed in the 1940s.
1.4 Convex Duality Theory 17

Proof If the primal problem has a Lagrange multiplier λ, then −λ ∈ ∂v(0). By the
Fenchel–Young equality

v(0) + v ∗ (−λ) = −λ, 0 = 0.

Direct calculation yields

v ∗ (−λ) = sup{ −λ, y − v(y)}


y

= sup{ −λ, y − F (x, y)} = F ∗ (0, −λ).


y,x

Since

− F ∗ (0, −λ) ≤ v ∗∗ (0) ≤ v(0) = −v ∗ (−λ) = −F ∗ (0, −λ), (1.4.4)

λ is a solution to the dual problem and p = v(0) = v ∗∗ (0) = d.


On the other hand, if v ∗∗ (0) = v(0) and λ is a solution to the dual problem, then
all the quantities in (1.4.4) are equal. In particular,

v(0) + v ∗ (−λ) = 0.

This implies that −λ ∈ ∂v(0), so that λ is a Lagrange multiplier of the primal


problem. 

Example 1.4.2 (Finite Duality Gap) Consider



v(y) = inf {|x2 − 1| : x12 + x22 − x1 ≤ y}.

We can easily calculate





⎨0 y>0
v(y) = 1 y=0


⎩+∞ y < 0,

and v ∗∗ (0) = 0, i.e. there is a finite duality gap v(0) − v ∗∗ (0) = 1.


In this example neither the primal nor the dual problem has a Lagrange multiplier
yet both have solutions. Hence, even in two dimensions, existence of a Lagrange
multiplier is only a sufficient condition for the dual to attain a solution and is far
from necessary.
Another random document with
no related content on Scribd:
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back

You might also like