Majorization and Matrix-Monotone Functions in Wireless Communications

CITv3n6.
qxd 6/20/2007 12:36 PM Page 1
FnT CIT 3:6 Majorization and Matrix-Monotone Functions

Foundations and Trends® in
Communications and Information Theory
Majorization and Matrix-Monotone Functions 3:6 (2006)
in Wireless Communications
Eduard Jorswieck and Holger Boche
Majorization Theory and Matrix-Monotone Functions in Wireless Communications, reviews the

basic definitions of Majorization Theory and Matrix-Monotone Functions, describing their
concepts clearly with many illustrative examples. In addition to this tutorial, new results are
presented with respect to Schur-convex functions and regarding the properties of matrix-
Majorization and
monotone functions.
Matrix-Monotone Functions
The approach taken by the authors provides a valuable overview of the basic techniques for
readers who are new to the subject. They then proceed to show in separate chapters the in Wireless Communications
cutting edge applications of the two basic theories in wireless communications.
Majorization Theory and Matrix-Monotone Functions in Wireless Communications is an Eduard Jorswieck and Holger Boche
Eduard Jorswieck and Holger Boche

invaluable resource for students, researchers and practitioners involved in the state-of-the-art
design of wireless communication systems.
This book is originally published as

Foundations and Trends® in Communications and
Information Theory
Volume 3 Issue 6 (2006), ISSN: 1567-2190.
now
now
the essence of knowledge
Majorization and
Matrix-Monotone
Functions in Wireless
Communications
Majorization and
Matrix-Monotone
Functions in Wireless
Communications
Eduard Jorswieck
Department of Electrical Engineering
Royal Institute of Technology
11400 Stockholm, Sweden
eduard.jorswieck@ee.kth.se
Holger Boche
Fraunhofer Institute for Telecommunications
Heinrich-Hertz-Institut
Einsteinufer 37
10587 Berlin, Germany
holger.boche@hhi.fhg.de
Boston – Delft
Foundations and Trends
R
in
Published, sold and distributed by:

now Publishers Inc.
PO Box 1024
Hanover, MA 02339
USA
Tel. +1-781-985-4510
www.nowpublishers.com
sales@nowpublishers.com
Outside North America:

now Publishers Inc.
PO Box 179
2600 AD Delft
The Netherlands
Tel. +31-6-51115274
The preferred citation for this publication is E. Jorswieck and H. Boche, Majoriza-
tion and Matrix-Monotone Functions in Wireless Communications, Foundation and
Trends R
in Communications and Information Theory, vol 3, no 6, pp 553–701, 2006
ISBN: 978-1-60198-040-3
c 2007 E. Jorswieck and H. Boche
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, mechanical, photocopying, recording
or otherwise, without prior written permission of the publishers.
Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen-
ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for
internal or personal use, or the internal or personal use of specific clients, is granted by
now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The
‘services’ for users can be found on the internet at: www.copyright.com
For those organizations that have been granted a photocopy license, a separate system
of payment has been arranged. Authorization does not extend to other kinds of copy-
ing, such as that for general distribution, for advertising or promotional purposes, for
creating new collective works, or for resale. In the rest of the world: Permission to pho-
tocopy must be obtained from the copyright owner. Please apply to now Publishers Inc.,
PO Box 1024, Hanover, MA 02339, USA; Tel. +1-781-871-0245; www.nowpublishers.com;
now Publishers Inc. has an exclusive license to publish this material worldwide. Permission
to use this content must be obtained from the copyright license holder. Please apply to now
Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail:
Foundations and Trends R
in
Volume 3 Issue 6, 2006
Editorial Board
Editor-in-Chief:
Sergio Verdú
Depart of Electrical Engineering
Princeton University
Princeton, New Jersey 08544
USA
verdu@princeton.edu
Editors
Venkat Anantharam (UC. Berkeley) Amos Lapidoth (ETH Zurich)

Ezio Biglieri (U. Torino) Bob McEliece (Caltech)
Giuseppe Caire (U. Sounthern Neri Merhav (Technion)
California) David Neuhoff (U. Michigan)
Roger Cheng (U. Hong Kong) Alon Orlitsky (UC. San Diego)
K.C. Chen (Taipei) Vincent Poor (Princeton)
Daniel Costello (U. Notre Dame) Kannan Ramchandran (UC.
Thomas Cover (Stanford) Berkeley)
Anthony Ephremides (U. Maryland) Bixio Rimoldi (EPFL)
Andrea Goldsmith (Stanford) Shlomo Shamai (Technion)
Dave Forney (MIT) Amin Shokrollahi (EPFL)
Georgios Giannakis (U. Minnesota) Gadiel Seroussi (MSRI)
Joachim Hagenauer (TU Munich) Wojciech Szpankowski (Purdue)
Te Sun Han (Tokyo) Vahid Tarokh (Harvard)
Babak Hassibi (Caltech) David Tse (UC. Berkeley)
Michael Honig (Northwestern) Ruediger Urbanke (EPFL)
Johannes Huber (Erlangen) Steve Wicker (Cornell)
Hideki Imai (Tokyo) Raymond Yeung (Hong Kong)
Rodney Kennedy (Canberra) Bin Yu (UC. Berkeley)
Sanjeev Kulkarni (Princeton)
Editorial Scope

in Communications and Informa-
tion Theory will publish survey and tutorial articles in the following
topics:
• Coded modulation • Multiuser detection

• Coding theory and practice • Multiuser information theory
• Communication complexity • Optical communication channels
• Communication system design • Pattern recognition and learning
• Cryptology and data security • Quantization
• Data compression • Quantum information processing
• Data networks • Rate-distortion theory
• Demodulation and Equalization • Shannon theory
• Denoising • Signal processing for
• Detection and estimation communications
• Information theory and statistics • Source coding
• Information theory and computer • Storage and recording codes
science • Speech and Image Compression
• Joint source/channel coding • Wireless Communications
• Modulation and signal design
Information for Librarians

in Communications and Information Theory, 2006,
Volume 3, 6 issues. ISSN paper version 1567-2190. ISSN online version 1567-
2328. Also available as a combined paper and online subscription.
in
Vol. 3, No. 6 (2006) 553–701

c 2007 E. Jorswieck and H. Boche
DOI: 10.1561/0100000026
Majorization and Matrix-Monotone Functions

Eduard Jorswieck1 and Holger Boche2,3,4
1
Royal Institute of Technology, Department of Electrical Engineering,
Signal Processing, 11400 Stockholm, Sweden, eduard.jorswieck@ee.kth.se
2
Technical University of Berlin, Department of Electrical Engineering,
Heinrich-Hertz Chair for Mobile Communications, HFT-6 Einsteinufer 25,
10587 Berlin, Germany
3
Fraunhofer German-Sino Lab for Mobile Communications MCI,
Einsteinufer 37, 10587 Berlin, Germany
4
Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut,
Einsteinufer 37, 10587 Berlin, Germany, holger.boche@hhi.fhg.de
Abstract
This short tutorial presents two mathematical techniques namely
Majorization Theory and Matrix-Monotone Functions, reviews their
basic definitions and describes their concepts clearly with many illus-
trative examples. In addition to this tutorial, new results are presented
with respect to Schur-convex functions and regarding the properties of
matrix-monotone functions.
The techniques are applied to solve communication and informa-
tion theoretic problems in wireless communications. The impact of
spatial correlation in multiple antenna systems is characterized for
many important performance measures, e.g., average mutual informa-
Eb
tion, outage probability, error performance, minimum N 0
and wide-
band slope, zero-outage capacity, and capacity region. The impact of
user distribution in cellular systems is characterized for different scenar-
ios including perfectly informed transmitters and receivers, regarding,
e.g., the average sum rate, the outage sum rate, maximum throughput.
Finally, a unified framework for the performance analysis of multiple
antenna systems is developed based on matrix-monotone functions. The
optimization of transmit strategies for multiple antennas is carried out
by optimization of matrix-monotone functions. The results within this
framework resemble and complement the various results on optimal
transmit strategies in single-user and multiple-user multiple-antenna
systems.
Contents
1 Introduction 1
1.1 Majorization Theory 1
1.2 Matrix-Monotone Functions 3
1.3 Classification and Organization 4
1.4 Notation 7
2 Majorization Theory 9
2.1 Definition and Examples 10
2.2 Basic Results 18
2.3 Majorization and Optimization 30
3 Matrix-Monotone Functions 33

3.2 Basic Characterizations 43
3.3 Matrix Norms 47
3.4 Further Properties 52
4 Application of Majorization in Wireless

Communications 59
4.1 Spatial Correlation in Multiple Antenna Systems 59
4.2 User Distribution in Cellular Communication Systems 86
ix
5 Application of Matrix-Monotone Functions
in Wireless Communications 103
5.1 Generalized Multiple Antenna Performance Measures 103
5.2 Optimization of Matrix-Monotone Functions 114
6 Appendix 133
6.1 Linear Algebra 133
6.2 Convex Optimization 134
7 Acknowledgments 141
References 143
1
Introduction
This short tutorial presents two mathematical techniques namely

Majorization Theory and Matrix-Monotone Functions which are
applied to solve communication and information theoretic problems
in wireless communications.
1.1 Majorization Theory

Inequalities have been always a major mathematical research area
beginning with Gauß, Cauchy, and others. Pure and applied mathe-
matical analysis needs inequalities, e.g., absolute inequalities, triangle
inequalities, integral or differential inequalities, and so on. The build-
ing blocks of Majorization are contained in the book [48]. The complete
theory including many applications is presented in [92]. The theory is
about the question how to order vectors with nonnegative real compo-
nents and its order-preserving functions, i.e., functions f which satisfy
that for x y it follows f (x) ≥ f (y). The characterization of this class
of functions is important to exploit the properties of this monotony.
In the wireless communication context, those functions arise nat-
urally in resource allocation for multiple user systems or multiple
1
2 Introduction
antenna systems, e.g., sum rate of the multiple access channel (MAC)
with K users and channels α1 , . . . , αK as a function of the power allo-
cation p1 , . . . , pK with inverse noise power ρ
K
!
X
C(p) = log 1 + ρ pk αk .
k=1
PK
Assume that the sum power is constraint to K, i.e., k=1 pk =
K. Order the components α1 ≥ α2 ≥ · · · ≥ αK ≥ 0 and p1 ≥ p2 ≥ · · · ≥
pK ≥ 0. The function C turns out to be Schur-convex with respect to
p, i.e., monotonic decreasing with respect to the Majorization order.
If p q then C(p) ≥ C(q). Therefore, the maximum value is attained
for a power allocation vector with elements, i.e., C([K, 0, . . . , 0]) ≥
C(p) ≥ C(1).
This monotony behavior is illustrated for K = 2 with power allo-
cation p = [2 − p, p] in Figure 1.1. This result implies that TDMA is
optimal, because the complete transmit power is optimally allocated to
one user [80].
Fig. 1.1 Sum rate of MAC with channels α1 = 2, α2 = 1 as a function of the power allocation
p = [2 − p, p].
1.2 Matrix-Monotone Functions 3
Most of the basic definitions and basic properties can be found in the
text books [8, 48, 50, 51, 92]. Majorization theory is a valuable tool and
it is successfully applied in many research areas, e.g., in optimization
[39, 168], signal processing and mobile communications [59, 105], and
quantum information theory [101].
1.2 Matrix-Monotone Functions

More than 70 years have passed since Löwner [88] proposed the notion
of matrix-monotone functions. A real, continuous function f : I → R
defined on a nontrivial interval I is said to be matrix monotone of
order n if
X ≥ Y ⇒ f (X) ≥ f (Y )
for any pair of self-adjoint n × n matrices X and Y with eigenvalues

in I. Löwner characterized the set of matrix-monotone functions of
order n in terms of the positivity of certain determinants (the so-called
Löwner determinants and the related Pick determinants), and proved
that a function is matrix monotone if and only if it allows an analytic
continuation to a Pick function; that is, an analytic function defined
in the complex upper half-plane, with nonnegative imaginary part.
A function is called matrix monotone if it is matrix monotone for all
orders n.
A representation theorem was proven for the class of matrix-
monotone functions [34, 83, 88, 156]. Every matrix-monotone function
f can be expressed as
Z ∞
st
f (t) = a + bt + dµ(s) (1.1)
0 s+t
with a positive measure µ ∈ [0, ∞) and real constants a, b ≥ 0.

Representatives of the class of matrix-monotone functions arise nat-
urally in the context of multiple antenna systems in the single- as well
as in the multiuser context. The two most important examples are the
mutual information and the minimum mean square error (MMSE) in
multiple-input multiple-output (MIMO) systems. Consider the mutual
4 Introduction
information1 for the vector model y = Hx + n between x and y for

independently complex zero-mean Gaussian distributed x and n with
covariances Q and I
f (HQH H ) = I(x; y) = log det I + HQH H .

The mutual information denoted as the function f (HQH H ) =

tr log I + HQH H can be represented by the matrix-monotone func-

tion f (t) = log(1 + t) which has the integral representation

Z ∞
t 1
f (t) = ds.
1 s + ts
Hence, all results that hold for matrix-monotone function also hold
for the mutual information and (as we will show later) for the MMSE.
This approach allows to unify many recent results and it is possible to
extract the main principles and concepts.
Finally, matrix-monotone functions are applied in many other areas,
e.g., in optimization [25] and signal processing for communications [71].
1.3 Classification and Organization

1.3.1 Classification and Differences to Related Literature
The well-established book [92] contains more results on Majoriza-
tion than this short tutorial. The main difference is that this tutorial
focusses on a subset of topics from [92], especially results regarding
averages and distributions of weighted random variables, as well as
averages of trace functions. These topics are treated in more detail, new
results are added (from subsection 2.2.3 until subsection 2.2.7), and the
connection to the application in communication theory is always kept
in mind. Furthermore, the first two tutorial chapters are rigorous in
the sense that they contain all necessary definitions and results but
additionally contain also many remarks and examples which help the
reader to understand the concepts.
There exist approaches in the literature that propose a unified
framework for analysis and optimization of MIMO systems. First, the
1 Without any operational meaning for the moment.

1.3 Classification and Organization 5
PhD thesis [104] provides a framework for optimization of linear MIMO

systems also by using Majorization theory. The tutorial [107] extends
these results to nonlinear decision feedback MIMO systems. Interest-
ingly, the application of Majorization in the other tutorial [107] is not
for analysis of impact of fading parameters on system performance
but for the optimization of single-user transmit strategies under vari-
ous objective criteria. Another difference to the tutorial [107] is that
the article at hand offers two own full chapters with a tutorial of the
mathematical techniques used. Therefore, both tutorial complement
one another well.
Another related tutorial is [122] which studies the active field of
interference function calculus. An interesting overview presentation
is given in the plenary lecture at the workshop on signal processing
advances in wireless communications in June 2007 [12].
Furthermore, a unified analytical description of MIMO systems was
studied in the PhD thesis [79]. The main focus in [79] is to derive a
framework for analytically computing closed-form expressions of MIMO
transceiver performances which are then used for optimization. Finally,
the connection between the capacity and mean-square-error (MSE)
from an estimation and information theoretic point of view was ana-
lyzed in the PhD thesis [42]. The thesis contains one part that clearly
shows the connection between the capacity and MMSE for various chan-
nel models, e.g., discrete, continuous, scalar, and vector channels and
different input signals. In subsection 5.1.2 three different relationships
between the mutual information and the MMSE are described.
1.3.2 Organization
The first two chapters present the definitions, properties, and many
examples to explain the foundations and concepts of the two techniques.
The three main topics discussed are
(a) the partial order on vectors and matrices,

(b) the characterization of order preserving functions,
(c) the optimization of Schur-convex and matrix-monotone
functions.
6 Introduction
The main goal of these two chapters is to make the reader famil-
iar with the basic concepts and to enable her to apply these tech-
niques to problems in his or her respective research area. The various
examples illustrate the theoretical concepts and reconnect to practi-
cal problem statements. In “Majorization Theory,” we present novel
results with respect to Schur-convexity and Schur-concavity for the
most general classes of functions and constraints. Later in “Application
of Majorization in Wireless Communications,” these functions obtain
their operational meaning in the context of communication theory.
In “Matrix-Monotone Functions,” we present novel results in terms
of bounds for matrix-monotone functions, optimization of matrix-
monotone functions, and discuss the connection to matrix norms as
well as to connections and means.
In “Application of Majorization in Wireless Communications” and
“Application of Matrix-Monotone Functions in Wireless Communica-
tions,” we apply the learned techniques to concrete problem statements
from wireless communications. The four main application areas are
(a) spatial correlation in multiple antenna systems,

(b) user distributions in cellular systems,
(c) development of a unified performance measure,
(d) optimization of MIMO system performance.
The main goal of these two chapters is to show under which condi-
tions and assumptions both techniques can be used. Furthermore, it is
shown how to interpret the results carefully to gain engineering insights
into the design of wireless communication systems. In “Application of
Majorization in Wireless Communications,” a measure for spatial corre-
lation in multiple antenna communications is developed. This measure
is exploited for various performance measures and in many scenarios
to analyze the impact of spatial correlation. A measure for the user
distribution in cellular systems is developed and the sum performance
of up- and downlink communication as a function of the user distri-
bution is analyzed. In “Application of Matrix-Monotone Functions in
Wireless Communications,” we develop a generalized performance mea-
sure which unifies mutual information and MMSE criteria. Finally, the
1.4 Notation 7
results from “Matrix-Monotone Functions” are used to optimize the

single-user and multi-user MIMO system performance.
The appendix contains two sections with basic results from Linear
Algebra and Convex Optimization. These results are used extensively
throughout the book.
1.4 Notation
Vectors are denoted by boldface small letters a, b, and matrices by
boldface capital letters A, B. AT , AH , and A−1 are the transpose, the
conjugate transpose, and the inverse matrix operation, respectively.
The identity matrix is I, and 1 is the vector with all ones. ◦ is the
Schur-product and ⊗ is the Kronecker product. diag(X) is a vector
with the entries of X on the diagonal. Diag(x) is a diagonal matrix
with the entries of the vector x on its diagonal. Diag(A, B) is a block-
diagonal matrix with matrices A and B on the diagonal. A1/2 is the
square root matrix of A and [A]j,k denotes the entry in the jth row
and the kth column of A.
The set of real numbers is denoted by R and the set of complex
numbers by C. The set of positive integers is N+ . Denote the set of all
n × n positive semi-definite matrices by Hn . The multivariate complex
Gaussian distribution with mean m and covariance matrix Q is denoted
by CN (m, Q). The expectation is denoted by E. The partial order for
vectors x y says x majorizes y, or equivalently x y means x is
majorized by y. For matrices the order A ≥ B means that A − B is
positive semi-definite. The strict versions of these orders for vectors and
matrices are denoted by , ≺, >, and <. [a]+ denotes the maximum
of a and 0.
2
Majorization Theory
A total order is a binary relation on a set X . It is antisymmetric, tran-

sitive, and total. For example, the set of real numbers R can be totally
ordered by the order relation less than < and greater than >. Consider
the set of all vectors with two non-negative real components which sum
up to one, i.e., {x ∈ R2+ : x1 + x2 = 1}. Since all vectors can be param-

t
eterized by for 0 ≤ t ≤ 1, the corresponding order reduces to
1−t
the one dimensional case. Let x and y be two dimensional non-negative
real vectors. Without loss of generality, order the components of the
vector in decreasing order and compare the first component x1 and
≥ y1 then the vector x is greater than or equal to y, e.g.,
y1 . If x1
0.8 0.6
. However, this approach does not extend to the case
0.2 0.4
with three or more components.
This chapter studies a certain partial order on vectors with more
than two components. We will assume that the vector has non-negative
real components. The partial order “Majorization” will describe when
the components of a vector x are “less spread out” or “more nearly
equal” than the components of another vector y. Further on, functions
9
10 Majorization Theory
that are monotone with respect to this order are called “Schur-convex”
(monotonic increasing) or “Schur-concave” (monotonic decreasing)
functions. Standard results as well as novel results regarding Schur-
convex functions are reviewed, presented, and discussed. In order to
keep the representation simple and increase readability, many exam-
ples illustrate the definitions and results.
2.1 Definition and Examples

2.1.1 Majorization for Vectors
Definition 2.1 (Majorization for vectors). For two vectors x, y ∈

Rn with descending ordered components x1 ≥ x2 ≥ · · · ≥ xn ≥ 0 and
y1 ≥ y2 ≥ · · · ≥ yn ≥ 0, one says that the vector x majorizes the vector
y and writes
m
X m
X n
X n
X
x y if xk ≥ yk , m = 1, . . . , n − 1 and xk = yk .
k=1 k=1 k=1 k=1
Example 2.1. Assume the situation in Figure (2.1). We have two

different vectors. In scenario A and B the largest two components
(λA B A B
1 = λ1 and λ2 = λ2 ) are equal. The smallest three components in
scenario B are equal (λB B B
3 = λ4 = λ5 ), but in scenario A the smallest
three components are unequal (λA A A
3 > λ4 > λ5 ). In addition to this, the
sum of all components in scenario A and B is equal. Applying the order
Fig. 2.1 Example vectors: λA λB .

which is introduced in Definition 2.1, the vector in Scenario A majorizes

the vector in Scenario B (λA λB ).
Remark 2.1. Note that sometimes the definition of majorization is

the other way round. For example, in [50, p. 192], a vector x is said
to majorize vector y if the sum of the smallest m components of x is
greater than or equal to the sum of the smallest m components of y
for all 1 ≤ m ≤ n. This can lead to contradictive statements.
Note that the order of the components is notimportant
to the defi-
0.8 0.2
nition, i.e., the following vectors are equal with respect
0.2 0.8
to the partial order in Definition 2.1. The elements are always assumed
to be ordered in decreasing order.
The strict version of Definition 2.1 is denoted by x y and means
that the sum of the largest m components of the vector x is greater
than the sum of the largest m components of vector y for all 1 ≤ m < n
and nk=1 xk = nk=1 yk . For further definitions the less strict notion
P P
and will be used.
Example 2.2. The following vectors can be compared using

Majorization

1 1 1 1 1 1
, ,..., ≺ , ,..., ,0
n n n n−1 n−1 n−1

1 1
≺ ··· ≺ , , 0, . . . , 0 ≺ (1, 0, . . . , 0) .
2 2
Example 2.3. Since Majorization provides only a partial order, there

exist vectors (with at least three components) that cannot be com-
pared, e.g.,
   
0.6 0.55
0.25?  0.4  .
0.15 0.05
Closely related to Majorization are doubly stochastic matrices.
Definition 2.2 (Doubly stochastic matrix). An n × n matrix P

is doubly stochastic if Pij ≥ 0 for 1 ≤ i, j ≤ n and
n
X n
X
Pij = 1, 1 ≤ j ≤ n, Pij = 1, 1 ≤ i ≤ n.
i=1 j=1
Related to the two dimensional

example from
the introduction to this
t (1 − t)
section, the 2 × 2 matrix is doubly stochastic for
(1 − t) t
all 0 ≤ t ≤ 1. In fact, the set of all doubly stochastic matrices can
be parameterized by this matrix. The properties of doubly stochastic
matrices are described in the following Theorems.
Theorem 2.1(Birkhoff 1946). The permutation matrices constitute

the extreme points of the set of doubly stochastic matrices. Moreover,
the set of doubly stochastic matrices is the convex hull of the permu-
tation matrices.
Theorem 2.2 (Representation of doubly stochastic matrices).

Every n × n doubly stochastic matrix can be represented by a convex
combination of at most n2 − 2n + 2 permutation matrices. The number
n2 − 2n + 2 cannot be replaced by a smaller number.
The next theorem connects the partial order majorization and doubly
stochastic matrices.
Theorem 2.3 (Majorization and doubly stochastic matrices).

A necessary and sufficient condition for x y is that there exist a
doubly stochastic matrix P such that y = P x.
In order to illustrate this relationship the partial order itself, we provide

two examples.
Example
2.4. Consider the example from the beginning, i.e., x =
0.8 0.6
= y. The corresponding doubly stochastic matrix is
0.2 0.4
given by
2
!
1
0.6 3 3 0.8
= 1 2 .
0.4 0.2
3 3
The following concept is used to compare vectors with different `1 norm.
Definition 2.3 (Weak majorization). For x, y ∈ Rn with ordered

components x1 ≥ x2 ≥ · · · ≥ xn and y1 ≥ y2 ≥ · · · ≥ yn , y (sub)
majorizes weakly x, i.e., y w x means
m
X m
X
xk ≤ yk for all m = 1, . . . , n.
k=1 k=1
For x, y ∈ Rn with ordered components x1 ≥ x2 ≥ · · · ≥ xn and y1 ≥

y2 ≥ · · · ≥ yn , y (super) majorizes weakly x, i.e., y w x means
m
X m
X
xn−k+1 ≤ yn−k+1 for all m = 1, . . . , n.
k=1 k=1
The connection to doubly stochastic matrices is provided after the next

definition.
Definition 2.4 (Substochastic and superstochastic matrix).

A nonnegative matrix P = [p]ij for which there exists a doubly stochas-
tic matrix D = [d]ij with pij ≤ dij for all i, j is called substochastic
matrix. Similarly a nonnegative matrix P = [p]ij for which there exists
a doubly stochastic matrix D = [d]ij with pij ≥ dij for all i, j is called
superstochastic matrix.
Remark 2.2. Note that by Theorems from von Neumann the existence
of such a doubly stochastic matrix is assured [92, Thm. 2.C.1]. The
necessary and sufficient condition for weak (sub and super) majoriza-
tion are
y w x if and only if x = P 1 y
y w x if and only if x = P 2 y
with superstochastic matrix P 1 and substochastic matrix P 2 .
Another type of majorization is defined next.
Definition 2.5 (Log-majorization). Assume x = [x1 , . . . , xn ] and

y = [y1 , . . . , yn ] with xk ≥ 0 and yk ≥ 0. If
m
Y m
Y n
Y n
Y
xk ≤ yk for m = 1, . . . , n − 1 and xk = yk
k=1 k=1 k=1 k=1
then x is log-majorized by y, i.e., y log x.
Log-marjoziation is stronger than majorization. This is described in the

next theorem [92, 5.A.2.b].
Theorem 2.4 (Theorem 2.7 in [166]). Let the components of x,

y ∈ Rn+ be nonnegative. Then
x log y implies x w y.
2.1.2 Schur-convexity and Schur-concavity

The next definition describes a function Φ which is applied to the
vectors x and y with x y. The function is “order-preserving” with
respect to the partial order of Majorization if it is monotonic with
respect to the partial order.
Definition 2.6 (Schur-convex/Schur-concave). A real-valued

function Φ defined on A ⊂ Rn is said to be Schur-convex on A if
x y on A ⇒ Φ(x) ≥ Φ(y).
Similarly, Φ is said to be Schur-concave on A if

x y on A ⇒ Φ(x) ≤ Φ(y).
Next, consider one example of a Schur-convex function.
Example 2.5. Suppose that x, y ∈ Rn+ are positive real vectors and
the function Φ is defined as the sum of the squared components of
the vectors, i.e., Φ2 (x) = nk=1 |xk |2 . Then, it is easy to show that the
P
function Φ2 is Schur-convex on Rn+ , i.e., if x y ⇒ Φ2 (x) ≥ Φ2 (y).
In order to check whether a one dimensional real valued function

f : R+ → R is monotonic increasing or decreasing, one studies the first
derivative. The function is (strict) monotonic increasing if f 0 (x) ≥ (>)0
and (strict) monotonic decreasing if f 0 (x) ≤ (<)0 for all x ∈ R+ .
The following lemma (see [92, Thm. 3.A.4]) is sometimes called
Schur’s condition. It provides an approach for testing whether some
vector valued function is Schur-convex or not. The approach is similar
to the one dimensional case described above.
Lemma 2.5 (Schur’s condition). Let I ⊂ R be an open interval and

let f : I n → R be continuously differentiable. Necessary and sufficient
conditions for f to be Schur-convex on I n are
f is symmetric1 on I n (2.1)
and

∂f ∂f
(xi − xj ) − ≥ 0 for all 1 ≤ i, j ≤ n. (2.2)
∂xi ∂xj
Remark 2.3. Since f (x) is symmetric, Schur’s condition can be

reduced as in [92, p. 57]

∂f ∂f
(x1 − x2 ) − ≥ 0. (2.3)
∂x1 ∂x2
1A function is called symmetric if the argument vector can be arbitrarily permuted without
changing the value of the function.
From Lemma 2.5 follows that f (x) is a Schur-concave function on I n

if f (x) is symmetric and

∂f ∂f
(x1 − x2 ) − ≤ 0. (2.4)
∂x1 ∂x2
Finally, if we assume the components
of the vector be ordered, the
∂f ∂f
function is Schur-convex if ∂x1 − ∂x2 ≥ 0.
Remark 2.4. The reduction of Schur’s condition to only the largest

and second largest component in (2.4) shows a helpful property. Since
the function is assumed to be symmetrical, it suffices to consider only
two components and keep other n − 2 components fixed. Often, this
reduces the vector problem to a scalar problem by using the parame-
terization for the two components by a scalar t as in the introduction
to this chapter.
Remark 2.5. The definition of Schur-convexity and Schur-concavity

can be extended if another function Ψ : R → R is applied to Φ(x).
Assume that Φ is Schur-concave, if the function Ψ is monotonically
increasing then the expression Ψ(Φ(x)) is Schur-concave, too. If we
take for example the function Ψ(n) = log(n) for n ∈ R+ and the func-
tion Φ2 from the example above, we can state that the composition
of the two functions Ψ(Φ2 (x)) is Schur-concave on Rn+ . This result
can be generalized for many possible compositions of monotonically
increasing as well as decreasing functions, and Schur-convex as well as
Schur-concave functions (see Chapter 3 in [92]).
Next, another important class of Schur-convex function is illustrated.

Definition 2.7 (Symmetric gauge function). A function f :
Rn → R is called symmetric gauge function if
(1) f (u) > 0 for all u 6= 0,

(2) f (γu) = |γ|f (u) for γ ∈ R,
(3) f (u + v) ≤ f (u) + f (v),

(4) f (u1 , . . . , un ) = f (±ui1 , . . . , ±uin ) for any permutation i.
Example 2.6. The following are examples of symmetric gauge

functions:
(a) f (x) = max |xi |
1
(b) f (x) = ( |xi |p ) p .
P
Lemma 2.6 (Symmetric gauge functions are Schur-convex).

Since every symmetric gauge function is symmetric and convex, it is
also Schur-convex.
2.1.3 Majorization for Functions

Consider the transition from finite vectors to the infinite case. The
definition of majorization can be easily extended to the continuous
function case.
Definition 2.8 (Normalized function). A function is said to be

normalized, if it fulfills
Z 1
S(t)dt = 1.
0
Definition 2.9 (Majorization for functions). A nondecreasing

function S1 is defined to majorize a nondecreasing function S2 if both
are normalized, i.e.,
Z 1 Z 1
S1 (t)dt = S2 (t)dt = 1
0 0
and for every θ ∈ [0, 1]
Z 1 Z 1
S1 (t)dt ≥ S2 (t)dt.
θ θ
The definition of Schur-convexity and Schur-concavity can be extended

to functions by the following definition.
Definition 2.10 (Schur-concavity of functions). A real-valued

function H is said to be Schur-concave (resp. Schur-convex) if S1
majorizes S2 implies H(S1 (·)) ≤ H(S2 (·)).
Example 2.7. Similar to Example 2.5, consider the function H(S) =

R1 2
0 S(t) dt. By replacing the integral by an infinite sum, one sees that
the function is Schur-convex.
In fact, the results derived in the next section for vectors of size
n could be carefully applied to these normalized functions as well.
This can be observed by replacing the sum by an integral in most
equations.
2.2 Basic Results

In the text book [92] there are many results that show Schur-convexity
for certain classes of functions, their concatenations, applying outer
functions, and so on. In the following, the propositions that are useful
for later application will be stated and mostly proven in a convenient
way by using only the basic definitions and properties from the last
section.
Proposition 2.7 (3.C.1 in [92]). If I ⊆ R is an interval and g :

I → R is convex and twice differentiable, then
n
X
φ(x) = g(xk )
k=1
is Schur-convex on I n . Consequently, x y on I n implies φ(x) ≥ φ(y).
Proof. The first part of the Schur-condition in (2.3) is verified since

φ is obviously symmetric. The difference between the derivatives with
respect to x1 and x2 is given by

∂φ(x) ∂φ(x)
∆= − = g 0 (x1 ) − g 0 (x2 ) ≥ 0
∂x1 ∂x2
because g 00 (x) > 0 follows that g 0 (x1 ) > g 0 (x2 ) for all x1 > x2 . There-
fore, Schur’s condition in (2.3) holds.
Remark 2.6. Proposition 2.7 can also be stated with concave function
g and ending up in the Schur-concavity.
Proposition 2.8 (Theorem 1 in [93]). If φ is symmetric and con-

vex, then φ is Schur-convex. Consequently, x y implies φ(x) ≥ φ(y).
Proof. By symmetry the first part of Schur’s condition in (2.3) is veri-

fied. Next, let us construct a vector y that is majorized by x. Consider
the doubly stochastic matrix

α ᾱ
A=
ᾱ α
with ᾱ = 1 − α. Define y = Ax and we have x y with y1 = αx1 +
ᾱx2 , y2 = ᾱx1 + αx2 . Because φ is convex,
φ(y1 , y2 ) = φ(αx1 + ᾱx2 , ᾱx1 + αx2 ) = φ(α(x1 , x2 ) + ᾱ(x2 , x1 ))

≤ αφ(x1 , x2 ) + ᾱφ(x2 , x1 ) = φ(x1 , x2 ).
Remark 2.7. Note that the reverse statement is not true. Consider the
following Schur-convex function which is not convex [166, Ex. II.3.15].
Let φ : [0, 1]2 → R be the function

1 1
φ(x1 , x2 ) = log − 1 + log −1 .
x1 x2
Checking Schur’s condition it can be observed that φ is Schur-convex on
x ∈ [0, 1]2 : x1 + x2 ≤ 1. However, the function log(1/t − 1) is convex
on (0, 1/2] but not on [1/2, 1).
2.2.1 Inequalities in Matrix Theory

There are also many applications of Majorization to matrix theory (see
Chapter 9 in [92] and [166]).
Proposition 2.9 (Schur inequality). Let H be an n × n Hermitian

matrix. The vector of diagonal entries of H is majorized by the vector
of eigenvalues of H, i.e.,
λ(H) [H ii ]ni=1 .
Proof. By the eigenvalue decomposition, we have H = U ΛU H . The

diagonal elements of H11 , . . . , Hnn of H are
X X
Hii = uij uH
ij λj = pij λj ,
j j
where pij = uij uH

ij . Since U is unitary, P : P ij = pij is doubly stochas-
tic, we have
[H11 , . . . , Hnn ] = [λ1 , . . . , λn ]P = λP
hence λ [H1 1, . . . , Hn n].
Proposition 2.10 (Hadamard inequality). If H is an n × n posi-

tive semidefinite matrix such that H11 ≥ · · · ≥ Hnn , then
n
Y n
Y
Hkk ≥ λk (H) (2.5)
k=l k=l
for all l = 1, . . . , n with ordered eigenvalues λ1 (H) ≥ · · · ≥ λn (H).

Equality holds if H is diagonal.
Proof. Use Proposition 2.7 with g(x) = log x. g(x) = log(x) is a concave
function and λ [H11 , . . . , Hnn ] by Proposition 2.9.
The eigenvalues of the sum of Hermitian matrices are characterized

in many different ways [82]. In the following a result is stated without
proof that uses Majorization theory to give bounds on the spectrum

of the sum of Hermitian matrices. The proofs can be found in the text
book [92] and [8, Thm. III.2.1].
Theorem 2.11 (Weyl’s inequalities, 9.G.1 in [92]). If G and

H are n × n positive semi-definite matrices with ordered eigenvalues
λ1 ≥ · · · ≥ λn ≥ 0, then
     
λ1 (G) + λn (H) λ1 (G + H) λ1 (G) + λ1 (H)
.. .. ..
  .
     
 . . .
λn (G) + λ1 (H) λn (G + H) λn (G) + λn (H)
Corollary 2.1 (Weyl’s first inequality). If G and H are n × n

positive semi-definite matrices with ordered eigenvalues λ1 ≥ · · · ≥
λn ≥ 0, then λ1 (G + H) ≤ λ1 (G) + λ1 (H).
Most proofs of Weyl’s inequalities are based on the following theorem.
Theorem 2.12 (20.A.2 in [92]). If H is an n × n positive semi-

definite matrix with ordered eigenvalues λ1 ≥ · · · ≥ λn ≥ 0, then
k
X
max tr U HU H

= λl (H) and
U
l=1
Xk
min tr U HU H

= λn−l+1 (H),
U
l=1
where k = 1, . . . , n and the extrema are over k × n complex matrices U

satisfying U U H = I k .
The multiplicative version of the above result characterizes the spec-

trum of the product of Hermitian matrices.
Theorem 2.13 (H.1.a in [92]). If A and B are full rank n × n

positive semi-definite matrices then
   
log λ1 (GH) log λ1 (G) + log λ1 (H)
.. ..
 . (2.6)
   
 . .
log λn (GH) log λn (G) + log λn (H)
Remark 2.8. The statement in (2.6) leads by Theorem 2.4 directly to

   
λ1 (GH) λ1 (G)λ1 (H)
.. ..
 w  .
   
 . .
λn (GH) λn (G)λn (H)
2.2.2 Stochastic Majorization
Proposition 2.14 (Jensens inequality). Let X be a random vari-

able taking values in the open convex set X with finite expectation
E[X]. If φ : X → R is convex, then
EX φ(X) ≥ φ(EX X) (2.7)
with strict inequality if and only if φ is affine on the convex hull of
the support of X. Conversely, if (2.7) holds for all random variables X
taking values in X such that the expectation exists, then φ is convex.
A proof of this proposition can be found, e.g., in [92, 16.C.1].
Theorem 2.15 (Theorem in [93]). If X1 ,. . . ,Xn are interchange-

able random variables, a b, and φ is a continuous convex function
and symmetric in its n arguments, then
EX1 ,...,Xn φ(a1 X1 , . . . , an Xn ) ≥ EX1 ,...,Xn φ(b1 X1 , . . . , bn Xn ).
If φ is strictly convex, equality occurs only when a = b, possible after
reordering components, or Xi is zero with probability one.
Proof. Define the vector a = [a1 , . . . , an ]. The function ψ(a) =

EX1 ,...,Xn φ(a1 X1 , . . . , an Xn ) is obviously convex because the function
φ is assumed to be convex. To show that ψ is symmetric, it is neces-
sary to use the exchangeability of X1 , . . . , Xn and the symmetry of the
function.
Next, a number of novel Schur-convexity and Schur-concavity

results are listed for certain classes of functions. In later sections, these
functions will obtain an operational meaning in multiantenna and mul-
tiuser communications theory.
2.2.3 Average Scalar Function of Sum of Weighted

Random Variables
Assume that w1 , . . . , wn are independently and identically distributed
(iid) random variables according to some probability density function
(pdf). A special case is that s1 , . . . , sn which are iid standard expo-
nentially distributed, i.e., p(s1 ) = exp(−s1 ). Furthermore, the vector µ
will have non-negative entries that are ordered in non-increasing order
µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0.
Lemma 2.16 (Average of function of weighted sum). Suppose

the function f : R+ → R+ is concave. Then
n
" !#
X
G(µ) = Ew1 ,...,wn f µk wk (2.8)
k=1
is Schur-concave. Assume f is convex. Then the function G in (2.8) is

Schur-convex.
Proof. Check the conditions in Theorem 2.15. Since w1 , . . . , wn are

exchangeable random variables, the function is symmetric with respect
to µ. Furthermore, the function f ( nk=1 µk wk ) is jointly concave
P
(convex) in µ1 , . . . , µn and hence Schur-concavity (Schur-convexity)

follows.
2.2.4 Average Scalar Function of Maximum of Weighted

Random Variables
Assume that w1 , . . . , wn are independent and iid random variables
according to a Gamma distribution with scale parameter λ = 1, and
shape k which corresponds to the diversity order, i.e.,
wk−1 exp(−w)
p(w) = .
Γ(k)
Furthermore, the vector µ will have non-negative entries that are
ordered in non-increasing order µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0. Denote the
cumulative distribution function (cdf) of the maximum as P (x), i.e.,
 
n x n n
!
Y γ n, µk Y X xl
P (x) =  = 1 − exp(−x) .
Γ(n) l!
k=1 k=1 l=0
Rx
with the incomplete Gamma function γ(n, x) = 0 tn−1 exp(−t)dt.
Theorem 2.17 (Average of maximum of weighted random

variables). Assume that the function f satisfies the following proper-
ties for constants 1 and 2
(1) lim f (x)(P (x) + 1 ) = c1 < ∞,

x→0
(2) lim f (x)(P (x) + 1 ) = c2 < ∞, and

x→∞
R
∞
(3) 0 f 0 (x)(P (x) + 2 )dx = c3 < ∞.

and it is monotonic increasing and concave then

G(µ) = Ew1 ,...,wn f max µk wk (2.9)
1≤k≤n
is Schur-convex. If f has the above properties and is monotonic decreas-

ing and convex, then G in (2.9) is Schur-concave.
Proof. Since the random variables w1 , . . . , wn are exchangeable, the

function is symmetric with respect to µ. The expectation in (2.9) can
be written as
Z ∞ ∞ Z ∞
0
f (x)P (x)dx = f (x)(P (x) + 1 ) − f 0 (x)(P (x) + 2 )dx.

0 0 0
(P (x)+)
The constants 1 and 2 arise because . dx = P 0 (x). With assump-
tions (1) and (2) the first term on the RHS exists. The second term
exists by the third assumption. Note that c1 and c2 are independent of
µ. Define
   
Z ∞ n
Y γ n, µxk
f (µ) = − f 0 (x)    +  dx.
0 Γ(n)
k=1
Next check Schur’s condition directly, i.e.,

 
n x
∂f (µ) ∂f (µ)
Z ∞ Y γ n, µk
∆= − = f 0 (x)  
∂µ2 ∂µ1 0 Γ(n)
k=3
  
1
x
n−1
x γ n, µ1
· 2 exp(−x/µ22 )  
µ2 µ2 Γ(n)
 
x
x
n−1
x γ n, µ2
− 2 exp(−x/µ1 )   .
µ1 µ1 Γ(n)
Define the difference as γ(x, µ1 , µ2 ). For all x ≥ 0, µ1 > µ2 the function

γ(x, µ1 , µ2 ) is positive for x ≤ x∗ and negative for x ≥ x∗ . First, consider
f to be monotonic increasing and concave. Then
Z ∞ Z ∞
0 0 ∗
f (x)γ(x, µ1 , µ2 )dx ≤ f (x ) γ(x, µ1 , µ2 )dx. (2.10)
0 0
R∞
Finally, it holds 0 γ(x, µ1 , µ2 )dx ≤ 0 for all µ1 , µ2 . Therefore, the
Schur-convexity follows.
If f is monotonic decreasing and convex, then
Z ∞ Z ∞
0 0 ∗
f (x)γ(x, µ2 , µ1 )dx ≥ f (x ) γ(x, µ2 , µ1 )dx ≥ 0 (2.11)
0 0
and the Schur-concavity follows in this case.

Example 2.8. We provide two examples to show the application of

Theorem 2.17. First, f (x) = log(1 + x). Choose 1 = 2 = −1 to fulfill
the conditions: The first condition follows directly
lim log(1 + x)(P (x) − 1) = 0.

x→0
The second condition is checked by noting that the incomplete Gamma

k
function can be rewritten as γ(n, x) = (n − 1)![1 − exp(−x) nk=0 xk! ],
P
i.e., the function decreases to one as least as fast as 1 − exp(−x) and

hence
lim log(1 + x)(P (x) − 1) = lim log(1 + x) exp(−x) = 0.

x→∞ x→∞
1
The next example chooses f (x) = 1+x . Again, let 1 = 2 = −1 to get
1 1
limx→0 1+x (P (x) − 1) = 1 and limx→∞ 1+x exp(−x) = 0.
2.2.5 Average of Scalar Function of Maximized

Weighted Sum
Assume that w1 , . . . , wn are iid random variables according to a Gamma
distribution with parameters λ = 1 and k > 0. Define a vector p =
[p1 , . . . , pn ] with non-negative entries. Furthermore, the vector µ will
have non-negative entries that are ordered in non-increasing order
µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0.
Theorem 2.18 (Average of maximized weighted sum). Con-

sider the function f (x) = log(1 + x), then
n
" !#
X
H(µ) = max E f pk µk wk
p:pk ≥0,|p|≤P
k=1
is Schur-convex with respect to µ.
The proof can be found in [68]. Interestingly, this result implies that
the optimal p∗ µ. Note that this result cannot be generalized to an
arbitrary concave function f .
2.2.6 Probability of Scalar Functions of Sums of Weighted

Random Variables
Assume again that w1 , . . . , wn are iid random variables according to
a Gamma distribution with scale parameter λ = 1, and shape k, i.e.,
k−1 exp(−w)
p(w) = w Γ(k) . For k = 1, the special case is that s1 , . . . , sn are
iid standard exponentially distributed, i.e., p(s1 ) = exp(−s1 ). Further-
more, the vector µ will have non-negative entries that are ordered in
non-increasing order µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0. In contrast to the clear
behavior of the average in Lemma 2.16, the probability of the func-
tion of the weighted sum of random variables w1 , . . . , wn shows varying
behavior.
Theorem 2.19 (Probability of weighted sum). Suppose that the

inverse function f −1 of f exists and f is non-negative. If f −1 (x) ≥ 2
and µ1 µ2 then
n n
" ! # " ! #
X X
Pr f µ1k wk ≤ x ≤ Pr f µ2k wk ≤ x .
k=1 k=1
If f −1 (x) ≤ 1 andµ1 µ2 then

n n
" ! # " ! #
X X
1
Pr f µk wk ≤ x ≥ Pr f µ2k wk ≤x .
k=1 k=1
The result says that Pr [f ( nk=1 µk wk ) ≤ x] is Schur-concave for x ≥

P
f (2) and Schur-convex for x ≤ f (1).
The proof can be found in [73]. The special case for random Gaussian
variables was solved in [133]. It relies heavily on the properties of the
distribution of the sum of weighted random variables, especially on the
unimodality of the corresponding pdf. With respect to majorization
there is no clear behavior in the interval f (1) < x < f (2). However, the
minimum of the probability can be characterized in a closed form [73,
Thm. 3].
2.2.7 Average Trace of Function of Sum of Weighted Dyads

Consider the random vectors h1 , . . . , hn of size m with iid zero-mean
complex Gaussian distributed entries with variance one. Furthermore,
the vector µ will have non-negative entries that are ordered in non-
increasing order µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0.
Theorem 2.20 (Average trace of weighted dyadic sum).

Assume that φ is a matrix-convex function.2 Then, the function
n
" !#
X
Φ(µ) = Eh1 ,...,hn tr φ µk hk hH
k (2.12)
k=1
is Schur-convex with respect to µ. If φ is matrix-concave, the function
Φ(µ) is Schur-concave.
Proof. For verifying Schur’s condition, the first derivative of (2.12)

with respect to µT1 and µT2 is important. These partial derivatives are
given by
nT
" !#
∂Φ(µ) H [1]
X
T H
= E tr h1 h1 · φ µk hk hk (2.13)
∂µT1 k=1
nT
" !#
∂Φ(µ) X
= E tr h2 hH 2 ·φ
[1]
µTk hk hH
k . (2.14)
∂µT2 k=1
The first two largest eigenvalues µ1 and µ2 are parameterized by
µ1 = µ + t and µ2 = µ − t.
Then, the difference between the first derivatives in (2.13) and (2.14)
is a function of t and is given by
h i
Γ(t) = E tr ∆ · φ[1] (R + t∆) (2.15)
with
nT
X
W k = hk hH
k , R = µ (W 1 + W 2 ) + µk W k , and
k=3
∆ = W 1 − W 2.
Next, the matrix-monotone function φ(A) can be written as3
Z ∞
φ(A) = sA [sI + A]−1 dµ(s)
0
2 This class of functions is defined in Subsection 3.1.2.

3 See the representation theorem in Subsection 3.2.1.
with probability distribution dµ(s). The “first derivative” as defined in

Section 3.2.2 is then given as
Z ∞
φ[1] (A) = s2 [sI + A]−2 dµ(s). (2.16)
0
The result from (2.16) is set into Γ(t) in (2.15) and integration and
summation is exchanged to obtain
Z ∞
Γ(t) = s2 E tr ∆ · [R(s) + t∆]−2 dµ(s) (2.17)
0
with R(s) = sI + µ (W 1 + W 2 ) + nk=3
P T
µk W k . Finally, we study the
trace expression in (2.17) as a function with respect to t. To this end,
we define for all s ≥ 0

M (t) = tr ∆ · [R(s) + t∆]−2 . (2.18)
Note, that Γ(0) = 0. The first derivative of the function M (t) with
respect to t is smaller than or equal to zero for all s ≥ 0 and all W 1 ,
W 2 . Therefore, the integral in (2.17) is smaller than or equal to zero,
because the outer integral is over a pdf, which is positive for all s by
definition and has only positive steps. With T = [R(s) + t∆]−1 it holds
∂M (t)
= − tr (∆T T ∆T ) − tr (∆T ∆T T )
∂t  
= −2 tr T ∆T
| {z∆} T .
 (2.19)
Q
Note, that the matrix T is positive definite. Finally, the matrix Q can
be written as
Q = W 1T W 1 − W 2T W 1 − W 1T W 2 + W 2T W 2
h ih i
= W 1 T 1/2 − W 2 T 1/2 T 1/2 W 1 − T 1/2 W 2
| {z }
C
H
= CC ≥ 0. (2.20)
Inequality (2.20) shows that the matrix Q is positive definite and
therefore the first derivative of M (t) with respect to t in (2.19) is
smaller than or equal to zero. Therefore, the function Φ(t) in (2.17) is
smaller than or equal to zero and this verifies Schur’s condition and
completes the proof.
Theorem 2.20 is a generalization of Theorem 2.15 to the matrix case.
2.3 Majorization and Optimization

It was mentioned in the introduction of this chapter that Majorization
theory is applied successfully to optimization. In this section, results
regarding the optimization of certain Schur-convex and Schur-concave
functions are listed and proven.
The simplest case is a programming problem in which a Schur-
concave function is to be maximized and the constraint set fits to
the `1 norm constraint, i.e., the optimization vector x is restricted
by ||x||`1 = X.
Theorem 2.21. Consider the Schur-concave function f : Rn+ → R+

and the following programming problem
max f (x) s.t. x ≥ 0, |x| = X. (2.21)

x
X
Then the global maximum is achieved by x = n 1. In contrast, the
corresponding minimization problem
min f (x) s.t. x ≥ 0, |x| = X (2.22)

x
has the global minimum x = [X, 0, 0, . . . , 0].
Proof. The proof follows directly from the fact that

X
x= 1 x [X, 0, 0, . . . , 0] = x
n
as well as from the Schur-concavity of the objective function, i.e.,
f (x) ≥ f (x) ≥ f (x).
√
Example 2.9. Let f (x) = nk=1 xk and X = 1. From Proposition
P
2.7 follows that f is Schur-concave. Then from Theorem 2.21 follows

√
f (x) = 1 ≤ f (x) ≤ n = f (x).
2.3 Majorization and Optimization 31
Example 2.10. Order the components of x in a decreasing order, i.e.,

x1 ≥ x2 ≥ · · · ≥ xn ≥ 0. Another example is f (x) = x1 = max1≤k≤n xn .
This function is obviously Schur-convex and the maximum is achieved
by x, the minimum is achieved by x.
Next, the question arises what happens if the equality constraint in the
optimization problems (2.21) and (2.22) is relaxed, i.e., if the constraint
set is, e.g., |x| ≤ X. The corresponding programming problem can be
stated as

max max f (x) . (2.23)
0≤ξ≤X x≥0,|x|=ξ
ξ∗
For the optimum of (2.23), it follows that x∗ = n1 if f is Schur-concave
where .∗ denotes the optimum.
√
Example 2.11. f (x) = nk=1 xk is Schur-concave and monotonic
P
increasing in the `1 norm of x. Therefore, the solution to (2.23) is

still x∗ = X
n 1.
√
Example 2.12. Consider f (x) = |x|(2 − |x|) nk=1 xk . This func-
P
tion is also Schur-convex but not monotonic increasing in the `1 norm of

x. Set the optimum x(ξ)∗ = nξ 1 into the function and obtain a scalar
√
function f (ξ) = nξ 3/2 (2 − ξ). The global maximum is attained by
ξ ∗ = 65 , i.e., x∗ = 5n
6
1.
The generalization of this approach is proven in [168, Thm. 2.4].
Theorem 2.22. Let f : Rn+ → R+ be a Schur-convex function and have

only one critical point a. If f (a) is a local minimum, then it must be
the global minimum.
3
Matrix-Monotone Functions
The last chapter discussed a certain partial order for vectors —

Majorization — and characterized the order preserving functions —
Schur-convex and Schur-concave functions. This chapter will propose
a certain order on matrices, namely the Löwner order. The order pre-
serving functions are called matrix-monotone functions and their rep-
resentations and properties will be discussed.
3.1 Definition and Examples

To define monotonicity for matrix-valued functions of matrices, a par-
tial order on the set of all n × n positive semidefinite matrices is
required. Define the set of all positive semidefinite matrices of size
n × n as Hn . One standard ordering which is called the Löwner order-
ing is described by
A ≤ B means B − A is positive semidefinite
A < B means B − A is positive definite.
In order to transfer the notion of monotonicity and convexity to
matrices and matrix-valued functions, the following definitions are
helpful.
33
34 Matrix-Monotone Functions
Let the function φ be an arbitrary matrix-valued function. The func-

tion φ maps from the set of positive semidefinite matrices to the set of
positive semidefinite matrices. The function φ is defined in the following
[8, 51].
Definition 3.1 (Matrix function). Let the eigenvalue decomposi-

tion of A be given as A = U ΛU H . Then φ(A) means
φ(A) = U φ(Λ)U H with φ(Λ) = diag (φ(λ1 ), . . . , φ(λn )) .
That means, the function φ affects only the eigenvalues of the matrix
A and keeps the eigenvectors unaltered.
Remark 3.1. There are many possible definitions of how a function

affects a matrix. The Definition 3.1 is motivated by the Spectral The-
orem (Proposition 6.2 in Section 6.1). It follows that the matrix func-
tion φ can be regarded as scalar function f : R+ → R+ . The interested
reader is referred to Chapter 6 of [51].
3.1.1 Matrix Monotonicity

The concept of matrix-monotonicity was first studied by [88].
Definition 3.2 (Matrix-increasing). Let A ⊂ Hn . A function φ :

A → Hn is matrix-increasing of order n on A if
A≤B implies φ(A) ≤ φ(B) (3.1)
for A, B ∈ A. The function is strictly matrix-increasing of order n on

A if
A<B implies φ(A) < φ(B)
for A, B ∈ A.
Remark 3.2. If a function is matrix-increasing for all orders n ≥ 1

it is just called matrix-increasing or sometimes also matrix-monotone.
Recently, it was shown that the class of matrix-monotone functions of

order n + 1 is a proper subset of the class of matrix-monotone functions
of order n [45].
Let us study some examples and counter examples of matrix-

monotonicity.
Lemma 3.1 (16.E.3.b in [92]). On the set of positive definite matri-

ces, the function φ(A) = A−1 is strictly decreasing.
Proof. Let Q1 ≥ Q2 > 0 and define

g(t) = (tQ1 + (1 − t)Q2 )−1 = Q(t)−1 .
It suffices to prove that g is strictly decreasing in 0 ≤ t ≤ 1. Note that
Q(t)Q(t)−1 = I. As a result
∂Q(t) −1 ∂Q(t)−1
Q (t) + Q(t) = 0.
∂t ∂t
Hence,
∂Q(t)−1

∂Q(t)
= −Q(t)−1 Q(t)−1
∂t ∂t
= −Q(t)−1 (Q1 − Q2 ) Q(t)−1 < 0.
Example 3.1. On the set of positive definite matrices, the function

φ(A) = A2 is not strictly increasing. Consider the following counter
example:

40 8 32 0 8 8
A= , B= , A−B= ,
8 20 0 11 8 9

2 2 640 480
A −B = .
480 343
The eigenvalues of the matrix A2 − B 2 are −10.94626578 and
993.9462658. The example is a special case of [92, 16.E.4] and it shows
that conclusions from the scalar case to the matrix case have to be
drawn very carefully.
3.1.2 Matrix Convexity

The concept of matrix-convexity was first studied by [83].
Definition 3.3 (Matrix-convex). Let A ⊂ Hn . A function φ : A →

Hn is matrix-convex of order n on A if
φ(αA + ᾱB)
≤ αφ(A) + ᾱφ(B) for all α ∈ [0, 1] and A, B ∈ A (3.2)
.
φ is strictly matrix-convex of order n on A if
φ(αA + ᾱB) < αφ(A) + ᾱφ(B) for all α ∈ [0, 1] and A, B ∈ A.
A function is called just matrix-convex if it is matrix-convex for all
orders n ≥ 1.
Theorem 3.2 (Theorem 2.5 in [46]). A nonnegative continuous

function on [0, ∞) is operator monotone if and only if it is operator
concave.
Remark 3.3. However, not every matrix-convex function is necessar-

ily matrix-monotone. The function φ(A) = A2 is matrix-convex but
not matrix-monotone as shown in Example 3.1.
Proposition 3.3 (Proposition 16.E.6 in [92]). Let φ be a function

defined on a convex set A of m × k matrices, taking values in Hn for
some n. If A is open and g is twice differentiable for all A, B ∈ A the
following are equivalent:
(1) φ is matrix-convex on A.
(2) For all fixed A and B in A, the function g(α) = φ(αA +
ᾱB) is convex in α ∈ [0, 1] in the sense that ηg(α) + η̄g(β) −
g(ηα + η̄β) is positive semidefinite for all α, β, η ∈ [0, 1].
2 g(α)
(3) For all fixed A, B ∈ A, d dα 2 is positive semidefinite for 0 <
α < 1.
Proof. Matrix convexity on A is fulfilled if and only if for all x ∈

Rn , ψ(A) = xφ(A)xH is convex. ψ(A) is convex on A if and only if
2 g(α)
ψ 00 (A) = x d dα 2 x
H ≥ 0 for all A ∈ A.
The next example illustrates the notion of matrix-concavity.
Example 3.2. Consider f (X) = log(I + X). In order to show that f

is matrix-concave, we need the following identity:
d2 log(I + αX)
= −X[I + αX]−2 X ≤ 0.
dα2
As a result, for g(α) = log(I + αX + ᾱY ) it holds
d2 g(α)
≤0
dα2
and matrix-convexity follows by Proposition 3.3. Matrix-monotonicity
follows from Theorem 3.2.
The properties of the matrix function φ can be transferred to scalar

function φ but not vice versa.
Corollary 3.1. Every matrix-monotone function is monotonic (incre-

asing or decreasing) whereas not every monotonic function is matrix
monotone. Every matrix convex function is convex whereas not every
convex function is matrix convex.
Example 3.3. The function φ(x) = x2 is increasing and convex but

not matrix monotone. The function φ(x) = exp(x) is convex but not
matrix convex.
The connection between the scalar properties of monotonicity and con-

cavity and the matrix-monotonicity is illustrated in the Venn-diagram
in Figure 3.1.
Fig. 3.1 Venn-diagram: Matrix-monotone functions are matrix-concave, concave, and mono-
tone.
3.1.3 Fréchet Derivative

Corresponding to the first and second derivatives of scalar functions,
there exists a derivative of the matrix-valued function φ. We follow
closely the derivation in [8, Sec. V.3 and Sec. X.4].
Definition 3.4 (Fréchet differentiable). The map φ is called

(Fréchet) differentiable at A if there exists a linear transforma-
tion Dφ(A) on the space of positive semidefinite matrices such that for
all H
||φ(A + H) − φ(A) − Dφ(A)(H)|| = o(||H||) (3.3)
holds. The linear operator Dφ(A)(H) is then called the derivative of

φ at A in direction H.
The difference from the scalar case is that a direction H is needed to

define the derivative.
Lemma 3.4 (First derivative in a direction). The first derivative

of φ(A) = Ap in direction of B is given by
p
X
Dφ(A)(B) = Ak−1 BAp−k .
k=1
Proof. Compute the first derivative of φ() = (A + B)p with respect

to at the point = 0. Using the product rule, it holds

dφ() d
= (A + B) (A + B)p−1
d d

d
+(A + B) (A + B) (A + B)p−2
d

p−1 d
+ · · · + (A + B) (A + B)
d
p
X d
= (A + B)k−1 (A + B) (A + B)p−k
d
k=1
p
X
= (A + B)k−1 B(A + B)p−k .
k=1
At the point = 0, we have

p

dφ() X
= Ak−1 BAp−k .
d

=0 k=1
This completes the proof.
Example 3.4. Next, three examples are provided to illustrate the

notion of the derivative of a matrix function.
(1) Let φ(A) = A2 . Then
Dφ(A)(B) = AB + BA.
(2) Let φ(A) = A−1 for invertible A. Then
Dφ(A)(B) = −A−1 BA−1 .
(3) Let φ(A) = AH A. Then
Dφ(A)(B) = AH B + B H A.
The derivative is linear, i.e., D(φ1 + φ2 )(A) = D(φ1 (A)) + D(φ2 (A)).
The composite of two differentiable maps φ and γ is differentiable.
The chain rule is D(γ · φ)(A) = Dγ(φ(A)) · Dφ(A). Interestingly, if

the trace operator is applied to the function φ, the following result can
be proven. This is a special case of the results in [47].
Lemma 3.5 (Derivative of trace of matrix function). Let φ be a

continuous analytic function mapping positive semidefinite matrices to
positive semidefinite matrices. Then, the first derivative of the function
tr [φ(C + D)] with respect to at the point = 0 is given by
∂ 0
tr [φ(C + D)] = tr φ (C) · D (3.4)

∂ =0
with the simple derivative φ0 of the (scalar) function φ.
Proof. Since the function is continuous and analytic, the function can
be approximated arbitrarily well by a polynomial. Both sides of (3.4) in
Lemma 3.5 are linear in φ. Therefore, it suffices to prove Equation (3.4)
for the powers φ(t) = tp with p ∈ N+ . It holds
p
!
∂ X
k−1 p−k
tr [φ(C + D)] = tr C DC

∂ =0
k=1
= tr DC p−1 + CDC p−2 + · · ·

= p · tr DC p−1

0
= tr φ (C) · D .
Example 3.5. Consider the function φ(A) = Ak , the first derivative

of tr φ at point A in direction D is given by

∂
tr [φ(A + D)] = k tr Ak−1 D .

∂ =0
Definition 3.5 (Twice Fréchet differentiable). Let φ be a differ-

entiable map from X to Y . At each point u, the derivative Dφ(u) is an
element of the Banach space L(X, Y ). This is a map Dφ from X into
L(X, Y ), defined as Dφ : u → Dφ(u). If this map is twice differentiable
at u, the derivative of the map Dφ at the point u is called the second

derivative of φ at u. It is denoted as D2 φ(u).
Remark 3.4. The second derivative of a function φ at point X has

two parameters, i.e.,
D2 φ(X)(A)(B) = D2 φ(X)(B)(A)
and it is symmetric in these two parameters.
Example 3.6. Two examples of second derivative are provided.
(1) Let φ(A) = A3 . The first derivative is
Dφ(A)(B) = A2 B + ABA + BA2 .
The second derivative is
Dφ2 (A)(B 1 , B 2 ) = AB 1 B 2 B 1 AB 2 + B 1 B 2 A + AB 2 B 1
+ B 2 AB 1 + B 2 B 1 A.
(2) Let φ(A) = A−1 . We have seen that Dφ(A)(B) =

−A−1 BA−1 . The second derivative is
D2 φ(A)(B 1 , B 2 ) = A−1 B 1 A−1 B 2 A−1 + A−1 B 2 A−1 B 1 A−1 .
3.1.4 First Divided Difference

In the following, we define the first divided difference of φ which is
closely related to the derivative of φ that was discussed above. Further-
more, the matrix constructed by these first divided differences is used
to characterize the class of matrix-monotone functions [8, Sec. V.3].
Definition 3.6 (First divided difference). Let I be an open inter-

val. Let φ be a continuously differentiable function on I. Then, we
0
denote by φ the function on I × I defined as
φ(λ1 ) − φ(λ2 )
φ[1] (λ1 , λ2 ) = , if λ1 6= λ2
λ1 − λ 2
φ[1] (λ1 , λ1 ) = φ0 (λ1 ).
The expression φ[1] (λ1 , λ2 ) is called the first divided difference of φ at
(λ1 , λ2 ).
Definition 3.7 (Matrix of first divided differences). The matrix

of first divided differences φ[1] (A) for positive semidefinite A = U ΛU H
is defined as
φ[1] (A) = U φ[1] (Λ)U H

Applying the diagonal matrix Λ with entries λ1 , . . . , λn , the function
φ[1] is defined as an n × n matrix with
h i
φ[1] (Λ) = φ[1] (λj , λk ).
j,k
The connection between the matrix of first divided differences and

the Freéchet derivative is described in the next lemma.
Lemma 3.6 (Lemma V.3.1 in [8]). If φ is a polynomial function

and A is positive semidefinite then
Dφ(A)(H) = φ[1] (A) ◦ H.
The next corollary shows directly the relationship between the first
divided difference matrix φ[1] and the simple derivative φ0 of the scalar
function φ(t).
Corollary 3.2 (First divided difference and first derivative).

For any matrix-monotone function φ and Hermitian matrices A, and
D the following identity holds

tr φ[1] (A) ◦ D = tr φ0 (A) · D .

(3.5)
Proof. Let the eigenvalue decomposition of A be given by A = U ΛU H .

It holds

d
tr φ[1] (A) ◦ D = tr (Dφ(A)(D)) = tr φ (A + D)

d
=0
d
tr φ U ΛU H + D

=
d
=0
" # !
d
φ Λ + U H DU U H

= tr U
d
=0
= tr U Dφ(Λ)(U H DU )U H

 
= tr φ[1] (Λ) ◦ U H
| {zDU}
Z
n h
!
X i
= tr φ[1] (Λ) Z k,k
k,k
k=1
n
!
X 0
= tr φ (Λ) k,k Z k,k
k=1
0
= tr φ (Λ)U H DU

= tr φ0 (A) · D .

(3.6)
In (3.6), the fact used that the diagonal of φ[1] and φ0 are identical and
that the trace of AB is equal to the trace of BA.
3.2 Basic Characterizations

3.2.1 Representation Theorem
The following integral representations for operator monotone and oper-
ator convex functions are parts of Löwner’s deep theory [166].
Theorem 3.7 (Representation for Löwner’s theory). Every

matrix-monotone function φ can be expressed as
Z ∞
st
φ(t) = a + bt + dµ(s) (3.7)
0 s+t
with a positive measure µ ∈ [0, ∞) and real constants a, b ≥ 0. Every

matrix-convex function ψ can be represented as
Z ∞
2 st2
ψ(t) = a + bt + ct + dµ(s) (3.8)
0 s+t
with a positive measure µ ∈ [0, ∞) and real numbers a, b, c ≥ 0.
Remark 3.5. There are different approaches and proofs of this fun-
damental theorem in the literature [34, 83, 88, 156].
Furthermore, the notion φ is used to denote a matrix function as

well as the associated scalar function. Applied to the matrix-valued
function φ, the matrix valued function can be represented as
Z ∞
φ(A) = aI + bA + sA (sI + A)−1 dµ(s). (3.9)
0
The representation in (3.9) works according to Definition 3.1. Interest-

ingly, the matrix φ̃(A) with entries
h i Z ∞ h i
φ̃(A) = aδkl + bAkl + s A (sI + A)−1 dµ(s) (3.10)
kl 0 kl
is identical to φ(A) in (3.9). In conclusion, it means that the matrix-

monotone function φ can be represented by three components, i.e., two
scalars and one measure M M = (a, b, µ(s)).
Example 3.7. The list of the following functions provides only a small
number of representatives. However, these functions will be of certain
importance in applications later.
(1) M M = 0, 0, s12 u(s − 1) with step function u(s) leads to

Z ∞
st 1
φ(t) = ds = log(1 + t).
1 s + t s2
(2) M M = − 1, 0, s13 u(s − 1) leads to

Z ∞
st 1 log(1 + t)
φ(t) = −1 + 3
ds = − .
1 s+ts t
(3) M M = (0, 0, δ(s − 1)) with the delta function δ(s) leads to
Z ∞
st 1
φ(t) = δ(s − 1)ds = .
0 s+t 1+t
(4) M M = 0, 0, sin(rπ) sr−2 with 0 < r ≤ 1 yields

π
sin(rπ) ∞ st r−2
Z
φ(t) = s ds = tr .
π 0 s + t
3.2.2 Derivative of Matrix-Monotone Function
Theorem 3.8 (Derivative of matrix-monotone function). The

first derivative of an arbitrary matrix-monotone function at A in direc-
tion B is given by
Z ∞
Dφ(A)(B) = bB + s2 [sI + A]−1 B [sI + A]−1 dµ(s). (3.11)
0
Proof. The
directional derivative is defined as Dφ(A)(B) =
∂φ(A+B)
∂ .
=0
Z ∞
φ(A + B) = aI + b(A + B) + s(A + B) [sI + A + B]−1 dµ(s).
0
The first derivative of φ(A + B) with respect to is given by

Z ∞
∂φ(A + B)
= bB + sB [sI + A + B]−1 dµ(s)
∂
Z ∞ 0
− s [A + B] [sI + A + B]−1 B
0
· [sI + A + B]−1 dµ(s).
At point = 0 we obtain
Z ∞h
∂φ(A + B) i
= bB + I − A [sI + A]−1 sB [sI + A]−1 dµ(s)
∂

0
=0
Z ∞
= bB + [sI + A − A] [sI + A]−1 sB
0
· [sI + A]−1 dµ(s)
which is equal to (3.11).
Corollary 3.3 (Derivative of trace of matrix-monotone func-

tion). The first derivative of the trace of an arbitrary matrix-
monotone function is given by
Z ∞
∂ tr φ(A)
0
φ (A) = = bI + s2 [sI + A]−2 dµ(s). (3.12)
∂A 0
Remark 3.6. The function φ0 in (3.12) corresponds to the function

φ0 (A) defined by
φ0 (A) = U diag[φ0 (λ1 ), . . . , φ0 (λn )]U H
because of (3.10).
The matrix function φ0 defined in (3.12) will be used when describ-

ing further properties of matrix-monotone functions. First derivatives
will become also important while studying optimality conditions for
certain programming problems.
We provide one example of the function φ0 .
Example 3.8. Consider the function tr φ(A) = tr log (I + A) and

assume that one does not know the derivative with respect to A. By
(3.12), the derivative is given by
Z ∞
∂ tr log (I + A)
0
φ (A) = = [sI + A]−2 ds = [I + A]−1 . (3.13)
∂A 0
Definition 3.8 (Completely monotone function). A function g

is called completely monotone if g (k) ≤ 0 for even k and g (k) ≥ 0 for
odd k.
3.3 Matrix Norms 47
Remark 3.7. Note that the derivative of a matrix-monotone function

is always completely monotone and has a unique representation [155,
p. 161]
Z ∞
g(t) = exp(−ts)dµ(s)
0
for measure µ(s).
Example 3.9. Therefore, the function φ0 (A) in (3.13) is completely

monotone and can be represented by
Z ∞ Z ∞
1
g(t) = = exp(−ts) exp(−s)ds = exp(−s(1 + t))ds.
1+t 0 0
3.3 Matrix Norms

One main question regarding the matrix functions that were character-
ized in the last section is how to map them to real numbers to measure,
e.g., the performance of a communication system. Often the simplest
and easiest solution is to take the trace of the matrix function. It will be
shown in later chapters that most common performance measures are
the trace of a certain matrix-monotone performance function. However,
an arbitrary norm also maps from the output of the matrix function to
the positive real line. Therefore, matrix norms and their properties are
studied in this subsection. Since we will consider only positive definite
matrices, the definitions are given for positive definite matrices and not
for the general case of Hermitian matrices. Most of the definitions can
be found in [50, 51].
Definition 3.9 (Matrix norm). For all positive semi-definite matri-

ces A, B, a norm || · || satisfies
(1) ||A|| ≥ 0 (nonnegative)

(2) ||A|| = 0 if and only if A = 0 (positive)
(3) ||cA|| = |c|||A|| for all complex scalars c (homogeneous)
(4) ||A + B|| ≤ ||A|| + ||B|| (triangle inequality)

(5) ||AB|| ≤ ||A|| · ||B|| (submultiplicative)
A norm that fulfills only property (1–4) is called vector-norm on

matrices.
Let us study three examples of matrix norms.
Example 3.10. The `1 norm for positive semi-definite A is defined by

n X
X n
||A||1 = |aij |.
k=1 j=1
The `2 norm or Euclidean norm for positive semi-definite A is

defined by
v
u n Xn
uX
||A||2 = t |aij |2 .
k=1 j=1
The `∞ norm is defined by
||A||∞ = max |aij |.

1≤i,j≤n
3.3.1 Unitary Invariant Norms and Matrix-Monotone

Functions
Since the class of matrix norms is very huge and for further analysis
only certain cases are needed, the following subclass of vector-norms
on matrices is defined.
Definition 3.10 (Unitary invariant norm). A vector-norm || · ||

on positive semi-definite matrices is said to be unitary invariant if
||U AV || = ||A|| for all positive semi-definite A and for all unitary
matrices U, V . The unitary invariant norm is denoted by |||A|||.
3.3 Matrix Norms 49
Remark 3.8. Furthermore, the unitary invariant norms are related to

symmetric gauge functions Φ on Rn by [8, Thm. IV.2.1]
|||H|||Φ = Φ(λ(H)). (3.14)
The vector λ(H) denotes the vector with singular values of H. For each
unitary invariant norm there is a corresponding symmetric gauge func-
tion and vice versa (see [51, Thm. 3.5.18]). For the definition and dis-
cussion of symmetric gauge functions see Definition 2.7 and the result
in Lemma 2.6.
Let us give some examples for unitary invariant matrix norms.
Example 3.11. Interestingly, the only unitary invariant vector norm

is the `2 norm. Examples for unitary invariant matrix norms are
for positive semi-definite A with ordered eigenvalues α1 ≥ α2 ≥ · · · ≥
αn ≥ 0
Pn p 1/p
(1) Schatten p norm: |||A||| = k=1 αk with special cases
p = 2 Frobenius norm or p → ∞ spectral norm or p = 1 the
trace norm. The corresponding symmetric gauge function is
Pn p 1/p
Φ(x) = x
k=1 k .
(2) Ky Fan k norm with generating symmetric gauge function
k
!
X
Φk (x) = max xil
1≤i1 <i2 <···<ik ≤n
l=1
for all 1 ≤ k ≤ n.
The special role of the Ky Fan k norm is outlined in the following

theorem.
Theorem 3.9 (7.4.45 in [50]). Let x and y be vectors in Rn+ . Then

Φ(x) ≤ Φ(y) for all symmetric gauge functions Φ on Rn+ if and only
if Φk (x) ≤ Φk (y) for k = 1, 2, . . . , n where Φk are the Ky Fan k norms
defined above.
Remark 3.9. The Theorem says that in order to have ||A|| ≤ ||B||
it is necessary and sufficient to check the Ky Fan norms 1 ≤ k ≤ n.
Note that this condition corresponds exactly to the statement that x
is majorized by y, i.e., x y. In other words, the symmetric gauge
function is Schur-convex with respect to x as stated in Lemma 2.6.
Note that the Löwner order and the Majorization of the eigenvalue
vector are two possible partial orders of positive semidefinite matrices.
If a matrix is greater than or equal to another matrix with respect
to the Löwner order, i.e., A ≥ B, then the eigenvalues of matrix A
weakly majorize also the eigenvalues of matrix B, i.e., λ(A) w λ(B).
In particular, if A ≥ B, then |||A||| ≥ |||B|||.
Next, combine the class of matrix-monotone functions with an outer

unitary invariant norm, the following results characterize how the prop-
erties of matrix-monotonicity are mapped by the norm operation. One
simple unitary invariant norm is the trace norm.
Lemma 3.10. Consider a matrix-concave function φ : Hn → Hn . The

scalar function tr φ : Hn → R+ is concave and monotone.
Proof. Both properties follow easily from the matrix-concavity in (3.2)

and matrix-monotonicity in (3.1) by applying the linear trace operator
to both sides of the inequalities.
The more general result is presented next. It corresponds to the

Theorem in [5] and [166, Thm. 4.4].
Theorem 3.11. Consider a nonnegative matrix-concave function φ. It

holds
|||φ(A) + φ(B)||| ≥ |||φ(A + B)|||.

For every nonnegative function g with g(0) = 0 and g(∞) = ∞ whose
inverse function is operator monotone
|||g(A + B)||| ≥ |||g(A) + g(B)|||.
3.3 Matrix Norms 51
Remark 3.10. This inequality can be complemented with results for

a concave φ.
An application of the above result of further relevance is described

in the following corollary [166, Cor. 4.8].
Corollary 3.4. For all A, B ≥ 0 and every ||| · |||,

||| log(I + A + B||| ≤ ||| log(I + A) + log(I + B)|||
and
||| exp(A) + exp(B)||| ≤ ||| exp(A + B) + I|||.
Another lower bound on matrix-monotone function is provided in the

next result [166, Thm. 4.10].
Theorem 3.12. Let φ be a nonnegative matrix-monotone func-

tion and ||| · ||| a unitary invariant norm with normalization
|||diag[1, 0, . . . , 0]||| = 1. Then for every matrix A ≥ 0,
φ(|||A|||) ≤ |||φ(A)|||.
The corresponding upper bound is provided in the next result [166,

Thm. 4.12] using a different normalization of the norm.
Theorem 3.13. Let φ be a nonnegative matrix-monotone function

and ||| · ||| a unitary invariant norm with normalization |||I||| = 1. Then
for every matrix A ≥ 0,
φ(|||A|||) ≥ |||φ(A)|||.
From the last two results the combination follows with E =

diag[1, 0, . . . , 0]

|||A||| |||A|||
|||E|||φ ≤ |||φ(A)||| ≤ |||I|||φ .
|||E||| |||I|||
A map Φ that maps from the set of positive semidefinite matrices to the
set of positive semidefinite matrices is called a permutation operator if
for all A the entries of Φ(A) are one fixed rearrangement of those of A.
Theorem 3.14 (Theorem 4.38 in [166]). For every permutation

operator Φ and all ||| · ||| on the set of positive semidefinite n × n
matrices
1 √
√ |||A||| ≤ |||Φ(A)||| ≤ n|||A||| (3.15)
n
√ √
with positive semidefinite A and the constants n and 1/ n are best
possible.
The special role of the `2 norm is underlined in the following result

[166, Thm. 4.40].
Theorem 3.15. If |||Φ(A)||| = |||A||| holds for all permutation opera-

tors Φ and all positive semidefinite A, the ||| · ||| is a constant multiple
of || · ||2 .
Definition 3.11 (Dual norm). Let || · || be a given norm on the set

of positive semi-definite matrices. The dual norm of || · || with respect
to the Frobenius inner product is defined by
||A||D = max{tr AB H : ||B|| = 1, B ≥ 0}.
By the duality theorem [51, Thm. 5.5.14], it follows (|| · ||D )D = || · ||.
3.4 Further Properties

The next theorem is a structural result for traces of matrix-monotone
functions. It derives an upper and lower bound. Furthermore, the matri-
ces that achieve these bounds are characterized. Theorem 3.16 is a
generalization of [36].
Theorem 3.16. For positive semidefinite matrices A and B with

eigenvalues α1 ≥ α2 ≥ · · · ≥ αn and β1 ≥ β2 ≥ · · · ≥ βn , it holds
min tr φ(diag(α1 , . . . , αn )diag(βπ1 , . . . , βπn )) ≤ tr φ(B 1/2 AB 1/2 )

π
≤ max tr φ(diag(α1 , . . . , αn )diag(βπ1 , . . . , βπn )) (3.16)
π
with permutation π.
This result is stated in [17] without a proof.
Proof. The function φ(B 1/2 AB 1/2 ) can be rewritten using the
eigenvalues decomposition A = U A ΛA U H H
A , B = U B ΛB U B , and U =
UHB U A . Without loss of generality, we assume that both A and B have
full rank. The function is
1/2 1/2
φ(B 1/2 AB 1/2 ) = φ(ΛB U AU H ΛB ).
Let B 0 = ΛB and A1 = U AU H . Note that B 0 and A1 do not com-

mutate, i.e., B 0 A1 6= A1 B 0 . Next, we parameterize a unitary matrix
U = eS with specific choice of S with S = −S H . We show that the
directional derivative of the parameterized function
1/2 1/2
φ() = φ(B 0 eS A1 e−S B 0 ) (3.17)
at the point = 0 is either positive definite by a specific choice of S or

negative definite for another choice of S. Therefore, it can neither be
the global maximum nor the global minimum of the objective function.
This is a contradiction and it follows that the maximum and minimum
is attained for commutating matrices A and B. For the maximum, we
chose
1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2
S = B 0 φ0 (B 0 A1 B 0 )B 0 A1 − A1 B 0 φ0 (B 0 A1 B 0 )B 0 6= 0.
Note that S H = −S in (3.4). Furthermore, it is zero only if A1 and B 0

commute or if one of them is the zero matrix.
The Taylor series expansion of φ() is given by

1/2 1/2
φ() = φ B 0 A1 B 0
!

1/2 1/2 1/2 1/2
+ B 0 SA1 B 0 − B 0 A1 SB 0 + 2 (. . .) + · · · .
The first derivative of tr φ() with respect to at the point = 0 is

given by
∂
1/2 1/2
φ() = tr φ0 (B 0 A1 B 0 )

∂ =0
!

1/2 1/2 1/2 1/2
× B 0 SA1 B 0 − B 0 A1 SB 0
1/2 1/2 1/2

= tr A1 B 0 φ0 (B 0 A1 B 0 )B 1/2 S
!
1/2 1/2 1/2 1/2
− B 0 φ0 (B 0 A1 B 0 )B 0 A1 S
h
1/2 1/2 1/2 1/2
= tr A1 B 0 φ0 (B 0 A1 B 0 )B 0
!
i
1/2 1/2 1/2 1/2
− B 0 φ0 (B 0 A1 B 0 )B 0 A1 S
= tr SS H > 0.

(3.18)
The same approach can be used to show that for another choice of S
the derivative of φ() at the point is negative definite.
Lemma 3.17. The value of the following two optimization problems

is equal
max |||φ(HQH H )||| = max |||φ(H H SH)|||. (3.19)
tr (Q)≤P tr (S)≤P
In addition to this, with the singular value decomposition (SVD) of

H = U H ΛH V HH , it holds
|||φ(HQH H )||| = |||φ(H H SH)||| if Q = V H U H H
H SU H V H . (3.20)
The dimensions of the three matrices in the SVD are given by: U H is
m × µ, ΛH is µ × µ, and V HH is µ × n with µ = min(m, n).
1
Proof. The value of the two optimization problems in (3.19) does not
depend on the left or right eigenvectors of H, because |||φ(U AU H )||| =
|||φ(A)||| for unitary U and because |||(U QU H )||| = |||(Q)|||. Denote
the rank of the n × m matrix H by ν. Furthermore, the value of the
matrix-monotone function φ at point zero is equal to zero, i.e., φ(0) = 0.
Write the unitary invariant norm as its symmetric gauge function, i.e.,
Φ(λ(A)) = |||A|||. As a result, the LHS of (3.19) is
max |||φ(HQH H )|||

tr (Q)≤P
P max φ(λ1 (H H H)p1 , . . . , λn (H H H)pn )

= m
Φ
p ≤P
k=1 k
= P max Φ φ(λ1 (H H H)p1 , . . . , λν (H H H)pν ) .

ν
(3.21)
p ≤P
k=1 k
The RHS of (3.19) is
max |||φ(H H SH)|||

tr (S)≤P
P max φ(λ1 (HH H )s1 , . . . , λn (HH H )sn )

= Φ
s ≤P
k=1 k
= P max Φ φ(λ1 (HH H )s1 , . . . , λν (HH H )sν ) .

ν
(3.22)
s ≤P
k=1 k
with eigenvalues s1 , . . . , sn of S. Equation (3.21) is equal to (3.22). The

second part of the Lemma follows easily from
φ(HQH H ) = φ(U H ΛH V H H
H QV H ΛH U H )
= φ(ΛH V H
H QV H ΛH )
= φ(ΛH V H H H
H V H U H SU H V H V H ΛH )
= φ(ΛH U H
H SU H ΛH )
= φ(V ΛH U H H
H SU H ΛH V H )
= φ(H H SH). (3.23)
1 This type of SVD is sometimes called “economy size decomposition.”

Definition 3.12 (Contraction). A matrix C is a contraction if

C H C ≤ I, or equivalently, ||C||∞ ≤ 1.
Theorem 3.18 (Theorem 1.15 in [166]). Let φ be a matrix-

monotone function on [0, ∞), ψ a matrix convex function on [0, ∞)
with ψ(0) ≤ 0. Then for every contraction C and every A ≥ 0,
φ(C H AC) ≥ C H φ(A)C and ψ(C H AC) ≤ C H ψ(A)C.
3.4.1 Connections to Connections

The material in this section serves as a brief outlook to a certain class of
binary operations that is closely related to matrix-monotone functions.
Definition 3.13 (Connection). A binary operation σ on the class of

positive definite matrices (A, B) → AσB, is a connection if the follow-
ing requirements are fulfilled [84]:
(1) A ≤ C and B ≤ D imply AσB ≤ CσD.

(2) C(AσB)C ≤ (CAC)σ(CBC).
(3) If a series An converges to A and a series B n converges to
B, respectively, then the series(An σB n ) converges to AσB.
If 1σ1 = 1 then the connection is a mean.

Interestingly, the inequality in (2) leads directly to the following
result if the inverse of C exists
C(AσB)C − (CAC)σ(CBC)
= C AσB − C −1 (CAC)σ(CBC)C −1 C

≥ C [AσB − AσB] C = 0. (3.24)
The inequality in (3.24) follows from (2). Therefore, it holds
C(AσB)C = (CAC)σ(CBC). (3.25)

Example 3.12. For positive invertible matrices A and B and for 0 ≤

α ≤ 1, the arithmetic mean, the geometric mean, and the harmonic
mean are defined as follows:
A∇α B = (1 − α)A + αB
α
A]α B = A1/2 A−1/2 BA−1/2 A1/2
−1
A!α B = (1 − α)A−1 + αB −1

.
Interestingly, the inequalities between the harmonic, geometric, and
arithmetic mean hold also for the matrix case [4, 116]
A!α B ≤ A]α B ≤ A∇α B.
Remark 3.11. There exists a one-to-one correspondence between a

connection σ and a matrix-monotone function φ ≥ 0 on [0, ∞). The
operator connection σ can be defined via the corresponding function φ,
which is called the representing function of σ, by
AσB = A1/2 φ(A−1/2 BA−1/2 )A1/2
if A is invertible, and σ is a matrix mean if and only if φ(1) = 1.
Regarding the three means from the last example, the corresponding
matrix-monotone functions are (1 − α) + αt for ∇α , tα for ]α , and
[(1 − α) + αt−1 ]−1 for !α . From this and the representation in (3.9),
it follows also the relation
Z ∞
1σA = φ(A) = aI + bA + sA [sI + A]−1 dµ(s)
0
Z ∞
= aI + bA + s(I!1/2 sA)dµ(s).
0

Example 3.13. Consider the function tr φ Z −1/2 HQH H Z −1/2 . It
can be represented by

tr φ Z −1/2 HQH H Z −1/2 = tr 1σφ Z −1/2 HQH H Z −1/2
= tr Z −1 Z −1 σφ HQH H .

4
Application of Majorization in Wireless
Communications
4.1 Spatial Correlation in Multiple Antenna Systems

Recently, there is a transition in communication theory of how fading
variations are judged. The time variation and spectral variation of the
propagation channel are nowadays welcome. In fact, they are exploited
to increase the reliability and spectral efficiency in mobile communica-
tions systems. It is well known, that fading variations in time, space,
and frequency, increase the diversity of the system (e.g., [159]).
With the introduction of MIMO systems the question about how to
model, analyze, and exploit the spatial correlation that is observed
at the transmit antenna array and the receive antenna array (see,
e.g., [165, ch. 2] for outdoor scenario and [102, 160] for MIMO chan-
nels). If the antenna geometry is simple, e.g., a uniform linear array
(ULA), the antenna correlation matrix leads to a Toeplitz structure.
Since multiple antennas are on both sides of the link, there may or
may not be correlation between transmit and receive antenna pairs.
In the Kronecker model, the correlation is modeled locally at the
transmit and receive side and in between rich multipath scattering is
assumed.
59
60 Application of Majorization in Wireless Communications
4.1.1 The Kronecker Model

Consider the quasi-static block flat-fading MIMO channel H. The
correlation of the channel matrices arises in the common downlink
transmission scenario in which the base station is un-obstructed [131].
We follow the model in [38] where the subspaces and directions of the
paths between the transmit antennas and the receive cluster change
more slowly than the actual attenuation of each path.
The most general form of the correlation model consists of a very
large correlation matrix of size (nT · nR × nT · nR ) which incorporates
the transmit and receive correlation, i.e., it is the expectation of the
outer product of the vectorized channel matrix
κ = E vec(H) · vec(H)H .

(4.1)
The correlation matrix κ in (4.1) expresses the correlation between each

transmit or receive element to every other transmit or receive element.
Often, the transmit and the receive antenna array are spatially divided.
Then, the following simplification is possible (see, e.g., [30]):
Definition 4.1 (Kronecker correlation model). If the transmit

correlation is independent of the receive antenna and the receive corre-
lation does not depend on the transmit antenna, the correlation matrix
in (4.1) is a block-matrix that is given by
κ = RR ⊗ RT (4.2)
and the corresponding correlation model is called Kronecker correlation

model.
The channel matrix H for the case in which we have the Kronecker
assumption and correlated transmit and correlated receive antennas is
modeled as
1 1
H = RR2 · W · RT2 (4.3)
with transmit correlation matrix RT = U T D T U H

T and receive correla-
H
tion matrix RR = U R D R U R . U T and U R are the matrices with the
eigenvectors of RT and RR , respectively, and D T , D R are diagonal
matrices with the eigenvalues of the matrix RT and RR , respectively.

The random matrix W has zero-mean independent complex Gaus-
sian identically distributed entries, i.e.,W ∼ CN (0, I). The matrix W
models the rich multipath environment between transmit and receive
antenna array.
The constructed matrix H in (4.3) satisfies (4.2) because
vec(AXC) = (C T ⊗ A)vec(X) and
E vec(H)vec(H)H

h i
1/2 1/2 1/2 1/2
= E (RR ⊗ RT )vec(W )vec(W )H (RR ⊗ RT )
1/2 1/2 1/2 1/2
= (RR ⊗ RT )(RR ⊗ RT ) = RR ⊗ RT .
In Figure 4.1, some of the basic assumptions in MIMO channels
are illustrated. Often, it is assumed that the base station antennas are
mounted on roof top of high buildings or towers. Therefore, less local
scatterer surround the base station antenna array and increased spatial
correlation can be observed. In contrast, the mobile moves around sur-
rounded by buildings, cars, trees, and pedestrians. Therefore, it is often
assumed that the mobile antennas are spatially uncorrelated. Note that
polarization diversity provides an additional degree of freedom [96]. The
analysis in [28, 29] is adapted to several special practical scenarios in
which so called keyholes occur. For example, in transmission scenarios
in which we have long corridors (see Figure 4.1) the channel can be
singular. This is not because of correlation at the transmitter or the
receiver but because of a keyhole in between.
Fig. 4.1 Propagation models: Correlated transmit antennas at base station, uncorrelated
mobile with rich scattering, and key-hole channel.
In the case in which each receive antenna observes the same corre-
lation between the transmit antennas, i.e., the transmit correlation is
independent of the receive antenna and vice versa the receive correla-
tion is independent of the transmit antenna, the correlation model in
(4.1) simplifies to the model in (4.3). Note that the Kronecker model
arises not only in MIMO communications but also in the modeling of
electroencephalography (EEG) data. Methods to estimate the correla-
tion matrices under the Kronecker assumption are described in [154].
Note that the Kronecker model is a limited correlation model that
can only be applied successfully under certain conditions on the local
scattering at the transmitter and receiver [85, 103]. Therefore, a more
generalized model is to allow a sum of Kronecker products [11], i.e.,
n
X
κ= RR T
k ⊗ Rk . (4.4)
k=1
However, it turns out that even the model (4.4) cannot cover the com-
plete set of positive semi-definite correlation matrices. One counter
example is explicitly given here for the case nT = nR = 21
 
3/4 0 0 3/8
 0 1/4 1/8 0 
 
κ= .
 0 1/8 1/8 0 
3/8 0 0 3/8
4.1.2 A Measure of Spatial Correlation

In order to provide a measure of correlation, we take two arbitrarily
chosen transmit correlation matrices R1T and R2T with the constraint
that trace(R1T ) = trace(R2T ) = nT which is equivalent to
nT nT
λT,1 λT,2
X X
l = l , (4.5)
l=1 l=1
with λT,1 T,2

l , 1 ≤ l ≤ nT , and λl , 1 ≤ l ≤ nT , are the eigenvalues of the
covariance matrix R1T and R2T , respectively.
1 This example is taken from [52].

This constraint regarding the trace of the correlation matrix RT is

necessary because the comparison of two transmission scenarios is only
fair if the average path loss is equal. Without receive correlation, the
trace of the correlation matrix can be written as
nT
X nT
X
H
E |hi |2 .

tr (RT ) = E HH ii
= (4.6)
i=1 i=1
However, the RHS of (4.6) is the sum of the average path loss from
the transmit antenna i = 1, . . . , nT . In order to study purely the impact
of correlation on the achievable capacity separately, the average path
loss is kept fixed by applying the trace constraint on the correlation
matrices R1T and R2T .
We will say that a correlation matrix R1T is more correlated than
R2T with descending ordered eigenvalues λT,1 T,1 T,1
1 ≥ λ2 ≥ · · · ≥ λnT ≥ 0
T,2 T,2 T,2
and λ1 ≥ λ2 ≥ · · · ≥ λnT ≥ 0 if
m m
λT,1 λT,2
X X
k ≥ k 1 ≤ m ≤ nT − 1. (4.7)
k=1 k=1
The measure of correlation is defined in a natural way: the larger the
first m eigenvalues of the correlation matrices are (with the trace con-
straint in (4.6)), the more correlated is the MIMO channel. As a result,
the most uncorrelated MIMO channel has equal eigenvalues, whereas
the most correlated MIMO channel has only one non-zero eigenvalue
which is given by λ1 = nT .
The following definition provides again themeasure for comparison
of two correlation matrices.
Definition 4.2 (Measure for spatial correlation). The transmit

correlation matrix R1T is more correlated than R2T if and only if
m m nT nT
λT,1 λT,2 λT,1 λT,2
X X X X
l ≥ l for m = 1, . . . , nT , and 1 = 2 .
l=1 l=1 l=1 l=1
(4.8)
One says that the vector consisting of the ordered eigenvalues λT1
majorizes λT2 , and this relationship can be written as λT1 λT2 like
in Definition 2.1.
Remark 4.1. Note that our definition of correlation in Definition 4.2

differs from the usual definition in statistics. In statistics a diagonal
covariance matrix indicates that the random variables are uncorre-
lated. This is independent of the auto-covariances on the diagonal. In
our definition, we say that the antennas are uncorrelated if in addi-
tion to statistical independence, the auto-covariances of all entries are
equal. This difference to statistics occurs because the direction, i.e., the
unitary matrices of the correlation have no impact on our measure of
correlation.
Imagine the scenario in which all transmit antennas are uncor-

related, but have different average transmit powers because of their
amplifiers. In a statistical sense, one would say the antennas are uncor-
related. Our measure of correlation says that the antennas are cor-
related, because they have different transmit powers. The measure of
correlation in Definition 4.2 is more suitable for the analysis of the
performance of multiple antenna systems, because different transmit
powers at the antennas obviously havea strong impact on the perfor-
mance. We will not consider such effects.
Example 4.1. As an example for the proposed measure of correlation,

consider the following simple 2 × 2 correlation matrix R, i.e., for 0 ≤
ρ≤1

1 ρ
R(ρ) = .
ρ 1
The eigenvalues of the correlation matrix are given by 1 + ρ and 1 − ρ.

Therefore, we have
ρ1 ≥ ρ2 =⇒ λ(R(ρ1 )) λ(R(ρ2 ))
and R(ρ1 ) is more correlated than R(ρ2 ). The extreme cases are ρ = 0
which leads to completely uncorrelated R(0) = I and ρ = 1 which leads
to completely correlated R(1).
Example 4.2. As another example of the measure of correlation, con-

sider a random Gaussian distributed vector z of dimension n with
z ∼ CN (0, R) and with covariance matrix R. Denote the eigenvalues
of the covariance matrix R as r = [r1 , . . . , rn ]. In the following, we show
that the entropy hr (z) is a Schur-concave function with respect to the
correlation eigenvalues r. The entropy of z is given by [33, Thm. 9.6.5]
n
" #
Y
hr (z) = log [(2πe)n det(R)] = log (2πe)n ri . (4.9)
i=1
It can be shown alternatively, by
n n
" #
Y X
hr (z) = log (2πe)n ri = log(2πe)n + log ri . (4.10)
i=1 i=1
According to Proposition 2.7 in Section 2.2
X n Xn
λµ→ log λi ≤ log µi .
i=1 i=1
Another measure of spatial correlation is explained in [89] for low power

spectral efficiency of MIMO systems.
Definition 4.3 (Dispersion of random n×n matrix). The disper-

sion of a random n × n matrix A is defined as
EA [ tr (A2 )]
ζ(A) = n .
E2A [ tr A]
Applied to an n × n correlation matrix R, the dispersion reduces to

2
the correlation number, i.e., ζ(R) = trnR . Obviously, the function is
Schur-convex with respect to the eigenvalues of the correlation matrix
by Proposition 2.7. Therefore, the dispersion is a special case of the
Majorization based measure of correlation.
Remark 4.2. In [53], another measure of spatial correlation is

described. It is named diversity measure and it is given by
tr R 2 ( tr R )2

Ψ(R) = =
||R||F tr R2
and therefore closely related to the correlation number defined above

n
Ψ(R) = .
ζ(R)
Therefore, the diversity measure is also a special case of the Majoriza-
tion based measure of correlation.
In the next subsections, various examples are presented where the

Majorization based measure of correlation is applied in order to char-
acterize the impact of spatial correlation on the system performance.
For different performance function it will be shown that there are
either Schur-convex or Schur-concave with respect to the eigenval-
ues of the correlation matrices. The interpretation will always be as
follows:
Consider a performance function which measures the percentage
of successful transmission, e.g., the BER, the SER, the MSE, or the
outage probability, and so on. The lower the performance function is
the better the performance.
• If the performance function is Schur-convex with respect to

the correlation eigenvalues, that means the function increases
with higher correlation and performance decreases. This
leads to the statement: Spatial correlation decreases perfor-
mance.
• If the performance function is Schur-concave with respect
to the correlation eigenvalues, that means the function
decreases with higher correlation and performance increases.
This leads to the statement: Spatial correlation increases per-
formance.
For a performance function which measures the number of error free

transmitted bits, e.g., the transmission rate, the spectral efficiency, or
the goodput, and so on. The higher the performance function is the
better the performance.
• If the performance function is Schur-convex with respect to

the correlation eigenvalues, that means the function increases
with higher correlation and performance increases. This leads
to the statement: Spatial correlation increases performance.
• If the performance function is Schur-concave with respect

to the correlation eigenvalues, that means the function
decreases with higher correlation and performance decreases.
This leads to the statement: Spatial correlation decreases
performance.
4.1.3 Error Performance of OSTBC

One concept for achieving high portions of the capacity and perfor-
mance gains in MIMO systems is space–time coding [41, 115, 135]. The
design and analysis of orthogonal space–time block codes (OSTBC)
was studied in [134, 136]. For the design of space–time codes it is usu-
ally assumed that the receiver has perfect channel state information
(CSI) while the transmitter has no CSI. However, the situation in which
space–time codes are combined with linear precoding is also studied,
e.g., in [57, 169, 170].
The average bit error performance of the Alamouti scheme is stud-
ied in [144], the average symbol error performance of OSTBC is ana-
lyzed in [130], and the performance of general space–time codes is
derived in [23]. The impact of correlation on the outage probability
in OSTBC MIMO systems is studied in [124].
We consider a single-user MIMO system with nT transmit and nR
receive antennas. The receiver has perfect CSI while the transmitter has
no CSI. The transmitter applies an OSTBC. The noise at the receiver
is complex iid distributed with variance σn2 I nR . The total transmit
power is constrained to P . We define the SNR as ρ = σP2 . The channel
n
is assumed to be a flat fading channel with matrix entries [hi,j ]ni=1,j=1
T ,nR
.
The receiver applies a matched filter and we obtain at the output k of
the matched filter at the receiver the following:
v 
u nT X nR
uX
r k = t |hi,j |2  xk + ñk ∀k = 1, . . . , nT (4.11)
i=1 j=1
with ñk is complex iid with variance σn2 because the normed matched fil-
ter matrix is unitary. The matched filter that leads to (4.11) is matched
to an effective channel that takes into account the space–time code, not
the actual physical channel.
We follow the approach in [110] in order to derive the BER of the

OSTBC. Consider one output of the matched filter from (4.11), then
the instantaneous SNR per bit is given by
   
nT X
X nR nT X
X nR
γ= |hi,j |2  ρ =  λTi λR
i si,j ρ.
 (4.12)
i=1 j=1 i=1 j=1
√
The decision x̂k = sign(rk ) has bit error probability Q 2γ . The
1
R∞ t2
Q-function is defined as Q(x) = 2π x exp(− 2 )dt. Averaging the bit
error probability over the pdf of the instantaneous SNR γ provides the
BER [110]. The pdf of γ is a function of nT and nR and the correlation
vectors µ and ν. Hence, the BER as a function of the SNR ρ, the
number of transmit nT and receive nR antennas and the correlation µ,
ν can be written as
  
h p i nT X
X nR
BER = Eγ Q 2γ = Es1,1 ,...,snT ,nR f ρ λTi λR
i si,j
 .
i=1 j=1
If we collect all tuples of eigenvalues of the transmit and receive cor-

relation matrix in one large vector η with η1 = λT1 λR T R
1 , η2 = λ1 λ2 , . . . ,
ηnT nR = λTnT λR nR and sort all components in non-decreasing order η1 ≥
η2 ≥ · · · ≥ ηnT ·nR ≥ 0 the average BER can be rewritten as
T ·nR
nX
" !#
BER = Es1 ,...,snT ·nR f ρ ηk sk .
k=1
We can readily apply Lemma 2.16 to obtain the following corollary.
Corollary 4.1 (Average BER of OSTBC is Schur-convex). The

average BER of OSTBC MIMO systems in spatially correlated Rayleigh
fading is Schur-convex with respect to the transmit and receive corre-
lation, i.e., the more correlated the transmit or receive antennas are
(according to the Definition 4.2) the worse is the performance.
The worst case performance is obtained by completely correlated

transmit and receive antennas. The performance is equal to the per-
formance of a SISO system with nT = 1 and nR = 1 but with channel
gain nT nR , i.e., BERwc = Es1 [f (nT nR s1 )]. The best case performance
is achieved for completely uncorrelated transmit and receive anten-
PnT PnR
nas, i.e., BERbc = Es1,1 ,...,snT ,nR f k=1 l=1 sk,l = Ev [f (v)] with
2
χ distributed v with 2nT nR degrees of freedom. This is maximum
diversity gain of the system. In [76], it is shown that the full diversity
gain is achieved as long as the channel correlation matrices have full
rank. If the correlation matrices have full rank, the spatial correlation
shifts the BER curves only to the right but do not change the slope.
In Figure 4.2, the average BER for BPSK modulation and a 2 × 2
MIMO system applying an Alamouti STC is shown for three correlation
scenarios.
The result can be interpreted in the following way: Since no CSI
is available at the transmitter, the spatial dimension is best used if
all spatial diversity is exploited and an OSTBC is used to achieve full
diversity. Therefore, the reduction in diversity due to correlated trans-
mit or receive antennas leads to a performance degradation. This fact
is shown in the corollary above.
Closed form expressions for the BER as well as an illustration can
be found in [76]. Equal gain combining and selection combining are
studied in [75].
Fig. 4.2 Average BER for Alamouti STC 2 × 2 and different spatial correlations.
4.1.4 Average Capacity of MISO Systems

Consider a single user system with multiple transmit antennas and
a single receive antenna. In [157], the potential of transmit diversity
systems was pointed out. The capacity of a MISO system with imper-
fect feedback was first analyzed in [99, 100, 149]. In [55, 66], the opti-
mum transmission strategy with covariance knowledge at the trans-
mit array with respect to the ergodic capacity was analyzed. It has
been shown that even partial CSI at the transmitter can increase the
mutual information of a MISO system. Recently, transmission schemes
for optimizing mutual information in MISO mean-feedback and covari-
ance-feedback systems were derived in [99, 149]. The capacity can be
achieved by Gaussian distributed transmit signals with a particular
covariance matrix.
Consider the standard MISO block-flat-fading channel model. The
block-flat-fading channel model is given by y = xH h + n with complex
nT × 1 transmit vector x, channel vector h (nT × 1), circularly sym-
2
metric complex Gaussian noise n with variance σ2n per dimension. The
channel vector h is constant for a block of T symbols. Then the chan-
nel changes to a completely new uncorrelated channel realization. For
convenience, we define the inverse noise variance as ρ = 1/σn2 . In the
following, we assume that the receiver knows h perfectly. The chan-
nel vector consists of complex Gaussian distributed entries with zero
mean and covariance matrix R, i.e., h ∼ CN (0, R). Denote the ordered
eigenvalues of R as µ1 ≥ µ2 ≥ · · · ≥ µnT ≥ 0. Denote wk = |hk |2 as iid
standard exponentially distributed random variables.
In [68], it is shown that the average mutual information with an
uninformed transmitter is given by
nT
!
noCSI ρ X
Copt (µ) = E log 1 + µl wl . (4.13)
nT
l=1
Furthermore, the ergodic capacity with perfect CSI at the transmitter

is given by
nT
!
pCSI
X
Copt (µ) = E log 1 + ρ µl wl . (4.14)
l=1
Corollary 4.2. The average mutual information in (4.13) and the

ergodic capacity in (4.14) are Schur-concave function with respect to
the vector of eigenvalues µ.
Proof. Since f (x) = log(1 + ax) for a > 0 is a concave function, this
result follows directly from Lemma 2.16.
The ergodic capacity with covariance feedback is given by [68]

nT
!
X
cfCSI
Copt (µ) = max E log 1 + ρ pl µl wl . (4.15)
p≥0:|p|≤1
l=1
Corollary 4.3. The ergodic capacity in (4.15) is Schur-convex with

respect to µ.
Proof. This follows from Theorem 2.18.
These results lead to a complete characterization of the impact of cor-

relation on the average mutual information of MISO systems. The
inequality chain in the next corollary shows the relation between the dif-
ferent CSI schemes and different levels of correlation. Assume that the
correlation vector µ2 majorizes µ1 , i.e., µ1 µ2 . We define the fully
correlated vector ψ = [nT , 0, . . . , 0]T and the completely uncorrelated
vector as χ = [1, 1, . . . , 1]T . Note, that the vector ψ majorizes all other
vectors and that the vector χ is majorized by all other vectors.
Corollary 4.4. For the ergodic capacities in MISO systems with dif-
ferent levels of correlation and different CSI at the transmitter, we have
the following inequalities:
noCSI noCSI
Copt (ψ) ≤ Copt (µ2 ) ≤ Copt
noCSI
(µ1 ) ≤ Copt
noCSI
(χ)
cfCSI cfCSI
= Copt (χ) ≤ Copt (µ1 ) ≤ Copt
cfCSI
(µ2 ) ≤ Copt
cfCSI
(ψ)
pCSI pCSI pCSI pCSI
= Copt (ψ) ≤ Copt (µ2 ) ≤ Copt (µ1 ) ≤ Copt (χ). (4.16)
An illustration of this inequality chain can be found in [68].

4.1.5 Outage Probability of MISO System

In contrast to the ergodic capacity which is a measure for the amount
of average information error-free transmitted, the outage probability
is a more subtle measure for the probability of successful transmission
while the channel is in a certain channel state. Since the instantaneous
capacity depends on the channel state, it is itself a random variable.
The first moment corresponds to the ergodic capacity. The cumulative
distribution function (cdf) is the outage probability. The outage prob-
ability gives the probability that a given transmission rate cannot be
achieved in one fading block. Recently, the outage probability was stud-
ied for multiple antenna channel and space–time codes [97, 125]. The
properties of the optimum transmission strategies change, if we replace
the ergodic capacity as objective function with the outage probability.
e.g., for no CSI at thetransmitter, the optimum transmission strategy
is to use only a fraction of the available number of transmit antennas.
Telatar has already conjectured this in [137]. In [64], a part of this con-
jecture is verified. In addition to this, in [64], a necessary and sufficient
condition for the optimality of single-antenna processing was derived.
The complete solution of Telatars conjecture can be found in [73].
Consider again the standard MISO block-flat-fading channel model.
The block-flat-fading channel model is given by y = xH h + n with
complex nT × 1 transmit vector x, channel vector h (nT × 1), cir-
2
cularly symmetric complex Gaussian noise n with variance σ2n per
dimension. The channel vector consists of complex Gaussian distributed
entries with zero mean and covariance matrix R, i.e., h ∼ CN (0, R).
Denote the ordered eigenvalues of R as µ1 ≥ µ2 ≥ · · · ≥ µnT ≥ 0.
Denote wk = |hk |2 as iid standard exponentially distributed random
variables.
In [73], different types of CSI are studied. Here, consider the case
where the transmitter is uninformed and it performs equal power allo-
cation. However, the channel is assumed to be correlated. In this case
the outage probability is given by
nT
" ! #
X
Pout (ρ, R, µ) = Pr log 1 + ρ µk sk ≤R .
k=1
Observe that the outage probability is symmetric but in general

neither concave nor convex with respect to µ. Therefore, the stochastic
majorization approach presented in Section 2.2.2 cannot be applied
and a new technique is developed. It turns out that the behavior of
the outage probability is chameleonic compared to the clear unique
behavior of the average mutual information.
Theorem 4.1 (Schur-convexity of outage probability). For a

MISO system which applies equal power allocation and for fixed trans-
mission rate R, the outage probability as a function of the correlation
properties of the transmit antennas is characterized by the following
statements:
R
• for SNR ρ < ρ = 2 2−1 , the outage probability is a
Schur-concave function of the channel covariance matrix
eigenvalues µ1 , . . . , µnT , i.e., correlation decreases the outage
probability and
µ1 µ2 =⇒ Pout (ρ < ρ, R, µ1 ) ≤ Pout (ρ < ρ, R, µ2 ),
• for SNR ρ > ρ = 2R − 1, the outage probability is a Schur-

convex function of the channel covariance matrix eigenvalues
µ1 , . . . , µnT , i.e., correlation increases the outage probability
and
µ1 µ2 =⇒ Pout (ρ < ρ, R, µ1 ) ≥ Pout (ρ < ρ, R, µ2 ).
The proof follows directly from Theorem 2.19. For more discussion and
illustrations, the interested reader is referred to [73].
4.1.6 Outage Probability of OSTBC–MISO Systems

There has been a considerable amount of work on a variety of new
codes and modulation signals, called space–time (ST) codes, in order to
approach the huge capacity of multiple antenna channels. One scheme
of particular interest is the Alamouti scheme [3] for two transmit anten-
nas. Later on, [134, 138] proposed more general schemes referred to as
OSTBC with the same properties as the Alamouti scheme like, e.g., a
remarkably simple maximum-likelihood decoding algorithm. The per-
formance of OSTBC with respect to mutual information was analyzed
(among others) for the uncorrelated Rayleigh fading case in [118, 6] and
for the more general case with different correlation scenarios and line
of sight (LOS) components in [98]. More information about ST codes
can be found in the books [86] and [109].
Consider again the standard MISO block-flat-fading channel model
given by y = xH h + n with complex nT × 1 transmit vector x, channel
vector h (nT × 1), circularly symmetric complex Gaussian noise n with
2
variance σ2n per dimension. The inverse noise variance is denoted by
ρ = σ12 . The channel vector consists of complex Gaussian distributed
n
entries with zero mean and covariance matrix I, i.e., h ∼ CN (0, I).
The transmitter has no CSI and applies an OSTBC. For data stream
k the received signal after channel matched filtering is given by
yk = ||h||2 xk + nk . (4.17)
For an OSTBC with nT transmit antennas it is shown [87] that the

maximum achievable rate is given by
b nT2+1 c + 1
rc (nT ) = (4.18)
2b nT2+1 c
It is important to note, that rc (k + 1) = rc (k + 2) with k even. Fur-

thermore, it holds that limnT →∞ rc (nT ) = 1/2.
The outage probability for the model in (4.17) is a function of the
number of transmit antennas nT , the rate R, and the SNR ρ and it is
given by [77]
Γ(l, (2R/rc (l) − 1) ρl )

Pout (ρ, nT , R) = 1 − (4.19)
Γ(l)
with the incomplete Gamma function Γ(n, x) [1]. The proof of the next
Theorem can be found in [77].
Theorem 4.2. Fix R and ρ. The minimum of the outage probability

of the OSTBC MISO system with nT antennas
Γ(l, (2R/rc (l) − 1) ρl )

min Pout (R, l, ρ) = min 1 − (4.20)
1≤l≤nT 1≤l≤nT Γ(l)
is attained for even l or nT .
In Figure 4.3, the outage probability as a function of the SNR is

shown for l = 2, 3, 4, 5, 6 active antennas and Rate R = 1. In Figure 4.3,
the switching SNR points from two to three ρ23 , from two to four ρ24 ,
from four to five ρ45 , and from four to six ρ46 are shown. In order to
minimize the outage probability we always choose the lowest of the
curves. That means that the higher the SNR the more antennas are
used. Up to 4.8 dB an orthogonal ST coded system with two antennas
is optimal. Then a system with four antennas is optimal in the range
from 4.8 dB up to 5.4 dB. And from 5.4 dB a system with 6 antennas
has minimum outage probability. In Figure 4.3, it can be observed that
as indicated in the proof of Theorem 4.2, theswitching points from an
Fig. 4.3 Outage probability of OSTBC with l = 2, 3, 4, 5, 6 active antennas and R = 1.

even to an odd number of antennas ρ23 and ρ45 are at a higher SNR
as the switching points from even to the next even number of antennas
ρ24 and ρ46 , respectively.
4.1.7 Average Capacity of MIMO System

Consider the standard MIMO block flat-fading channel model
y = Hx + n (4.21)
with complex nT × 1 transmit vector x, channel matrix H with

nR × nT entries, circularly symmetric complex Gaussian noise n with
σ2
variance 2N I per dimension. For convenience, we define the inverse
noise variance as ρ = 1/σN 2 . We assume that the receiver knows H
perfectly.
The channel matrix H for the case in which we have correlated
transmit and correlated receive antennas is modeled as in (4.2), i.e.,
1 1
H = RR2 · W · RT2 with transmit correlation matrix RT = U T D T U H T
and receive correlation matrix RR = U R D R U H R . U T and U R are
the matrices with the eigenvectors of RT and RR , respectively, and
D T , D R are diagonal matrices with the eigenvalues of the matrix
RT and RR , respectively, i.e., D T = diag[λT1 , . . . , λTnT ] and D R =
diag[λR R
1 , . . . , λnR ]. Without loss of generality, we assume that all eigen-
values are ordered with decreasing order, i.e., λT1 ≥ λT2 ≥ · · · ≥ λTnT . The
random matrix W has zero-mean independent complex Gaussian iden-
tically distributed entries, i.e., W ∼ CN (0, I).
The average performance measure will be defined in the next chap-
ter using matrix-monotone functions. Consider the following average
performance function and assume for the moment that φ is the mutual
information, i.e., φ(x) = log(1 + x). The characterization of a general
class of performance functions will be given in Section 5.1.4. Then the
average performance reads
nT
!
X
T R T H
Φ(λ , λ ) = E tr φ ρ λk w̃k w̃k
k=1
with w̃k , i.e., W̃ = RR W .

R
Corollary 4.5. For fixed receive correlation vector λ̄ and fixed vector
λT0 and for arbitrary vector λT1 which majorizes vector λT0 , i.e., λT1 λT0
R R
it follows that Φ(λT0 , λ̄ ) ≥ Φ(λT1 , λ̄ ).
T
For fixed transmit correlation vector λ̄ and fixed vector λR 0 and
for arbitrary vector λR 1 which majorizes vector λR
0 , i.e., λ R
1 λR
0 , it
T R T R
follows that Φ(λ̄ , λ0 ) ≥ Φ(λ̄ , λ1 ).
This Corollary follows from Theorem 2.20. The result is illustrated

in Figure 4.4 for a 2 × 2 MIMO system.
Further discussions, analysis, and illustrations can be found in [71].
4.1.8 MIMO Spectral Efficiency in the Low Power Regime

In [143], the spectral efficiency in the wideband regime was studied
Eb
using two novel performance metrics, namely the minimum N 0
and
Fig. 4.4 Average mutual information for 2 × 2 MIMO system as a function of transmit and
receive correlation.
the wideband slope S0 . These quantities characterize the first and

second order behavior of the capacity at low SNR values. For the
uncorrelated Rayleigh fading case, these performance metrics were
derived in [143, Thm. 12] for the informed transmitter case, and [143,
Thm. 13] for the uninformed transmitter case with perfect CSI at
the receiver only. In [89], the uninformed transmitter case with cor-
relation in Rician fading MIMO channels with polarization diversity
was studied. It turned out, that transmit and receive correlation has
Eb
no impact on the minimum N 0
but on the wideband slope. This
impact was quantified in [89] by the correlation number. In [141],
the impact of antenna correlation on the capacity of multiantenna
Eb
channels was analyzed by studying the minimum N 0
, the wideband
slope S0 , and the high SNR slope S∞ for certain classes of MIMO
channels.
The transmitter has nT transmit antennas. The receiver applies
nR receive antennas. The received signal vector y in the quasi-static
block flat fading MIMO channel H is given by y = Hx + n with
transmit signal x and additive white Gaussian noise (AWGN) vec-
tor n ∼ CN (0, σn2 I). The channel matrix H for the case in which we
have correlated transmit and correlated receive antennas is modeled
1 1
as H = RR2 · W · RT2 with transmit correlation matrix RT and receive
correlation matrix RR .
In [143], the low-SNR regime has been analyzed and two perfor-
Eb
mance measures namely the N 0 min
and the wideband slope S0 were
introduced. The system parameters bandwidth B, transmission rate
Eb

R, transmit power P , and spectral efficiency C N 0
satisfy the funda-
mental limit

R Eb
≤C . (4.22)
B N0

Eb
The function C is directly related to the common capacity expres-
N0

Eb
sion C(SNR), i.e., C N 0
= C(SNR) for the SNR which solves
Eb
C(SNR) = SNR.
N0
Eb

At low SNR, the function C N 0
can be expressed as [143]

Eb S0 Eb Eb
C ≈ − (4.23)

N0 3dB N0 dB N0 min dB

with
2
Eb loge 2 2 Ċ(0)
= and S0 = . (4.24)
N0 min Ċ(0) −C̈(0)
Eb Eb
The closer N 0
gets to N 0 min
the better is the approximation in (4.23).
Note, that the first and second derivative in (4.24) are taken of the
function common capacity function C(SNR).
In [89, 142], the two performance measures in (4.24) were computed
for the MIMO channel without CSI at the transmitter and with perfect
Eb
CSI at the receiver. The minimum N 0
and the wideband slope are given
by [143, Thm. 13]
Eb noCSI loge 2
= , (4.25)
N0 min nR
2n2T n2R
S0noCSI = nR nT . (4.26)
n2T (λR 2 n2R (λTk )2
P P
k) +
k=1 k=1
Eb
Note that we focus on the transmitted N0 as in [89].
Lemma 4.3. Fix the receive correlation λR . The wideband slope

S0noCSI as a function of the transmit correlation S0 (λR ) is Schur-
concave, i.e.,
λ1T λ2T =⇒ S0noCSI (λ1T ) ≤ S0noCSI (λ2T ).
For fixed transmit correlation, the wideband slope S0noCSI is Schur-

concave with respect to the receive correlation, i.e.,
λ1R λ2R =⇒ S0noCSI (λ1R ) ≤ S0noCSI (λ2R ).
Lemma 4.3 is a direct consequence of Lemma 2.16.

Eb
For perfect CSI at the transmitter and the receiver, the N0 min and
the wideband slope S0 are given by
Eb pCSI loge 2
= (4.27)
N0 min Eλmax (HH H )
2(Eλmax (HH H ))2
S0pCSI = . (4.28)
E(λmax (HH H ))2
The impact of correlation on the performance metric in (4.27) is char-
acterized in the following theorem.
Theorem 4.4. With perfect CSI at the transmitter and receiver, the
Eb
minimum N 0
is Schur-concave with respect to the transmit and receive
correlation, i.e., for fixed receive correlation it holds
Eb pCSI T Eb pCSI T
λT1 λT2 =⇒ (λ1 ) ≤ (λ2 ).
N0 min N0 min
Eb
Proof. The minimum N 0 min
and the wideband slope S0 do not depend
on the eigenvectors of the transmit and receive correlation matrix since
the pdf of H is invariant against multiplication with unitary matrix
from left and from right, i.e.,
E λmax (HH H ) = E λmax (RT W RR W H )

= E λmax (D T W D R W H ) .

(4.29)
Fix the receive correlation and express the expectation in (4.29) as a
function of the vector of eigenvalues in D T
f (λT ) = E λmax (diag(λT )W D R W H )

(4.30)
We have the following two observations:
(1) f (λT ) is symmetric with respect to λT since for all permu-

tation matrices Π it holds
f (ΠλT ) = E λmax (diag(ΠλT )W D R W H )

= E λmax (Πdiag(λT )ΠW D R W H )

= E λmax (diag(λT )ΠW D R W H Π)

= E λmax (diag(λT )W D R W H ) .

The last equality follows from the fact that the pdf of W and
ΠW is equal, because the permutation matrix Π is unitary.
(2) f (λT ) is convex with respect to λT . This holds even for each
realization W . Define Λ(t) = tΓ + (1 − t)Ψ. It holds
λmax (Λ(t)W D R W H ) = λmax (W H [tΓ + (1 − t)Ψ]W )

= λmax (tW H ΓW + (1 − t)W H ΨW )
≤ tλmax (W H ΓW )
+(1 − t)λmax (W H ΨW ). (4.31)
The last inequality is proven in Theorem 2.11 in Subsec-

tion 2.2.1.
Using the Theorem 2.15 in Subsection 2.2.2 we observe that the con-
ditions, i.e., convexity and symmetry are fulfilled for Schur-convexity.
This completes the proof.
Eb
For covariance knowledge at the transmitter, the minimum N0 and
the wideband slope S0 are given by
Eb covCSI loge 2
= (4.32)
N0 min nR λT1
2n2R
S0covCSI = PnR R 2 . (4.33)
E k=1 λk wk
This corresponds to the result in [143, Eq. (236)].
Theorem 4.5. With covariance knowledge at the transmitter and per-

Eb
fect CSI at the receiver, the minimum N 0
is Schur-concave with respect
to transmitter correlation and does not depend on the receiver correla-
tion. The wideband slope is Schur-concave with respect to the receiver
correlation and does not depend on the transmitter correlation.
The proof follows from Lemma 2.16. Bounds on the achievable per-
formance can be found in [70].
Eb
In Figure 4.5, the spectral efficiency over N 0
is shown for different
MIMO systems with uninformed transmitter and perfectly informed
Eb
Fig. 4.5 Spectral Efficiency over N0
for different MIMO systems and transmitter and
Eb
receiver correlation for uninformed transmitter. The solid lines are the N0
and wideband
slope S0 approximations, the symbols are the simulated results.
receiver. The impact of the number of receive antennas on the minimum

Eb
N0 can be observed. In addition to this, the wideband slope S0 decreases
with increasing transmitter and receiver correlation. The correlation
eigenvalues in the simulation in Figure 4.5 are [1.8, 0.2] for transmitter
Eb
and receiver correlation. The minimum N 0
values for the SISO, the
two times two, and three times three cases are −1.59, −4.602, and
−6.3629 dB, respectively.
Eb
The minimum N 0
of the three CSI scenarios (no CSI, perfect CSI,
covariance knowledge) are connected by the following inequalities:
nCSI −1 covCSI −1
Eb Eb
loge 2 = nR ≤ nR λT1 = loge 2
N0 min N0 min
with equality for completely uncorrelated transmit antennas and
covCSI −1
Eb
loge 2 = nR λT1 ≤ Eλmax (HH H )
N0 min
pCSI −1
Eb
= loge 2
N0 min
with equality for completely correlated transmit antennas and uncor-

related receive antennas. The last inequality follows from the fact that
Eλmax (HH H ) is monotonic increasing with increasing transmit and
receive correlation and from
Eλmax (HH H ) = Eλmax (RT W W H )

≥ λmax (RT )Eλmax (W W H )
≥ λT1 nR . (4.34)
Next, the inequalities for the wideband slope are presented. Denote
the completely correlated scenario, i.e., λT1 = nT and λT2 = λT3 =
· · · = λTnT = 0 and λR R R R
1 = nR and λ2 = λ3 = · · · = λnR = 0 by S0,cc and
the completely uncorrelated case, i.e., λT1 = λT2 = · · · = λTnT = λR 1 =
λR 2 = · · · = λ R = 1 as S
nR 0,uc . Then the following inequalities hold
2nT nR
S0 nCSI
uc = ≥ S0nCSI ≥ 1 = S0,cc
nCSI
nT + nR
2nR 1
S0 covCSI
uc = ≥ S0covCSI ≥ = S0,cc
covCSI
nR + 1 2
1
S0 pCSI
cc = ≥ S0pCSI .
2
4.1.9 Delay Limited Capacity of Multiple Antenna Systems

Future wireless communication systems will support more and more
delay-sensitive multimedia and entertainment streaming data that has
to be successfully transmitted within a fixed time frame. This is some-
times called non-elastic traffic. Already, the TCP/IP version 6 supports
hard delay constraints. However, the lower layers cannot adequately
handle those quality-of-service requirements, yet. As a result, it is nec-
essary to study hard delay constraints on the physical layer. In this sub-
section, we study the capacity of multiple antenna system under hard
delay constraints. Multiple antenna systems were extensively studied
in terms of their performance and achievable rates [37, 137]. Recently,
the impact of correlation on the average mutual information, on the
Eb
outage probability, and on the minimum N 0
and the wideband slope
S0 was analyzed in [19, 68, 70, 141].
For the delay-constraint analysis, the ergodic capacity as well as

the outage probability are not suitable. Both approaches do not guar-
antee the successful transmission of information in any finite number of
blocks. Therefore, we restrict the delay to one fading block and fix the
outage probability to some , i.e., Pr[C(α) < R] = . Then we solve this
for R to obtain the so called -capacity. In order to avoid outages at all,
we set = 0 and obtain the zero-outage capacity, or the delay limited
capacity (DLC) C d with Pr[C(α) < C d ] = 0. The DLC is defined as the
transmission rate that can be reliably supported in each fading state
H of the channel [44], i.e.,
h i
Pr log det I + ρHQ(H)H H < C d (ρ) = 0

(4.35)
under a long-term power constraint on Q, i.e., E [ tr Q] ≤ P . From an

information theoretic point of view, the notion of DLC is somewhat
problematic, since the code that achieves capacity requires a long block
length, but a block fading channel model is assumed. However, follow-
ing the arguments in [27], the outage probability predicts surprisingly
well the error probability of actual codes for practical values of block
length [81].
Consider again the quasi-static block-flat correlated Rayleigh fad-
ing MISO channel. The block-flat-fading channel model is given by
y = xH h + n and h ∼ CN (0, R). Denote the ordered eigenvalues of R
as µ1 ≥ µ2 ≥ · · · ≥ µnT ≥ 0. Denote wk = |hk |2 as iid standard exponen-
tially distributed random variables.
Theorem 4.6 (Section III.B in [44]). The DLC of fading MISO

channels is given by
 
ρ
C d (ρ, P, µ) = log2 1 + h i (4.36)
1
E ||h||2
with ||h||2 = nk=1

P R
µk wk . The optimal transmit strategy consists of
the encoder, power allocation, and beamforming. The optimal beam-
forming vector is given by v ∗ (h) = ||h||
h
2 and the optimal power
allocation by
P 1
p∗ (h) = h i . (4.37)
E 1 ||h||2
||h||2
Theorem 4.7. The DLC in (4.36) is Schur-concave with respect to

µ, i.e.,
µ1 µ2 =⇒ C d (ρ, P , µ1 ) ≥ C d (ρ, P , µ2 ).
The theorem is a consequence of Lemma 2.16. More details and

further illustrations can be found in [74].
4.1.10 Zero-Outage Capacity Region for SISO BC

Consider the downlink transmission of a cellular system. The base sta-
tion has multiple antennas (nT ), denote the channels to the users as
h1 , . . . , hK . The base applies an OSTBC (e.g., [3]). The data streams
d1 , . . . , dK of dimension 1 × nT of the K users are weighted by a power
allocation p1 , . . . , pK and added before they come into the OSTBC as
s1 , . . . , snT .
Each mobile first performs channel matched filtering according to
the effective OSTBC channel. Afterwards the received signal at user k
is given by
K
X
yk = ak xk + nk (4.38)
l=1
with fading coefficients αk = a2k = ||hk ||2 , transmit signal xl intended

for user l and noise nk . We assume that the fading processes of user k
and l for k 6= l are independently distributed. Let pk be the power allo-
cated to user k, i.e., pk = E[|xk |2 ]. Denote the long-term sum transmit
power constraint at the base station as P , i.e.,
"K #
X
Ea1 ,...,ak pk (a1 , . . . , ak ) ≤ P.
k=1
The noise power at the receivers is σk2 = ρ1 . The transmit power to noise
power is given by SNR = P ρ which is called transmit SNR. The chan-
1/2
nels are modeled by hk = wk Rk with correlation matrix Rk for
user 1 ≤ k ≤ K. Denote the eigenvalues of Rk in decreasing order
λk1 ≥ · · · ≥ λknT ≥ 0.
The following result is proven in [58] for the case where only linear
precoding is allowed at the base station and the base knows only the
norm of the channel vectors of all users.
Theorem 4.8. The zero-outage capacity region consists of all rates

r1 , . . . , rK for which
PK h i
1 −rk )
k=1 E αk (1 − 2
≤ SNR. (4.39)
1− K −rk )
P
k=1 (1 − 2
From Lemma 2.16 follows that the zero-outage capacity region

shrinks with increasing correlation.
Corollary 4.6. The guaranteed MSE region without SIC shrinks with
increasing spatial correlation at the mobile terminals, i.e., from λk γ k
for 1 ≤ k ≤ K, it follows MSE(λ1 , . . . , λK ) ⊆ MSE(γ 1 , . . . , γ K ).
In Figure 4.6, the zero-outage capacity region for two users and
two transmit antennas with symmetric correlation for different scenar-
ios is shown. Note that completely correlated transmit antennas lead
to zero-outage capacity. The uncorrelated scenario leads to E[1/α1 ] =
E[1/α2 ] = 1 whereas correlation λ increases this value to
log(λ) − log(2 − λ)
E[1/α1 ] = E[1/α2 ] = .
2λ − 2
4.2 User Distribution in Cellular Communication Systems

In this section, the single antenna multi-user scenario is studied in a
cellular environment. Cases of interest are the uplink and downlink
transmission corresponding to the multiple-access channel (MAC) and
Fig. 4.6 Zero-outage capacity region for MISO BC with two transmit antennas and two
users for different correlation scenarios λ = 1 and λ = 1.9.
broadcast channel (BC). The main focus of interest is on the impact

of deterministic path-loss variation with distance in conjunction with
random fast multipath fading. Random slow shadowing will not be
considered for clarity of exposition.
The distribution of users in a multiuser cellular communication sys-
tem has a great impact on the performance. In general, the interplay
between user distribution and performance depends on the performance
measure and the type of scheduling and transmit strategy. The trans-
mit strategy in turn depends on the available CSI. From a cross-layer
optimization point of view, Figure 4.7 illustrates some of the relation-
ships [62].
In [80], it was shown that the optimum strategy for maximizing the
sum capacity with perfect CSI of a cellular single-input single-output
(SISO) MAC is to allow only the best user to transmit at each time
slot. The result in [80] has induced the notion of multiuser diversity,
i.e., the achievable capacity of the system increases with the num-
ber of users while “riding on the peaks” [139]. In addition to this,
Fig. 4.7 Interplay between different terms in cross-layer optimization. The arrows corre-
spond to some different types of relationships and interactions, e.g., (1) The user distribu-
tion influences the fading statistics and thereby the CSI. (2) The user distribution influences
the scheduling strategy because cell-edge users are treated in a different way than close-
to-the-base users. (3) The network utility function directly determines the optimal schedul-
ing strategy. (4) The value of the network utility function depends on the availability of
CSI. Note that there are many more relationships between the four terms.
the result in [80] has led to the development of opportunistic down-

link scheduling algorithms [151] for the BC. In [15], the average sum
rate of the SISO MAC with successive interference cancellation (SIC)
under a sum transmit power constraints was studied for different types
of CSI. Recently, the downlink case was analyzed in [72]. It turned
out that the optimal scheduling depends strongly on the CSI at the
transmitter.
The average sum rate describes the long-term system throughput.
This performance measure can be used by the system operator to
optimize the overall throughput. The short-term system throughput
is measured by the outage sum rate and its corresponding outage prob-
ability [10]. It describes the probability that an outage occurs during
the next transmission block. The properties of the outage probability
with respect to the optimal transmit strategy and the channel statistics
(e.g., the user distribution) are different to the average sum rate [19].
There are two further performance measures, namely the delay limited
sum rate [44, 69] and the maximum throughput [2, 7], that describe
the guaranteed performance and the goodput of the system.
Recently, the scaling laws of wireless networks were analyzed

under simplified assumptions, e.g., the fading variances of the par-
ticipating users are equal (e.g., all users are located on a unit cir-
cle around the base), or for SNR approaching infinity. In [15, 72],
different user distributions are compared using Majorization theory
and their impact on the average sum rate was characterized. For per-
fect and long-term CSI, the sum rate was shown to be Schur-convex
with respect to the user distribution and for an uninformed base sta-
tion, the sum rate is Schur-concave. Also, the asymptotic sum rate
loss between the best case and the worst case user distribution, was
derived.
4.2.1 A Measure for User Distributions

Consider the downlink transmission. In the signal model, there are K
mobile users who are going to receive data from one base station. The
single-antenna quasi-static block flat-fading channels h1 , ..., hK between
the mobiles and the base are modeled as constant for a block of coher-
ence length T and from block to block as zero-mean independent com-
plex Gaussian distributed (CN(0, ck )). The variance is ci = E (h∗i hi ) for
1 ≤ i ≤ K.
The analysis of scaling laws often assumes iid distributed channels
across the users. In this section, we study the effect of different fad-
ing variances ck of the users. In order to guarantee a fair comparison,
we constrain the sum variance to be equal to the number of users,
i.e., K
P
k=1 ck = K. In Figure 4.8, the implications of this constraint are
shown. Starting from the symmetric scenario c1 = c2 = · · · = cK = 1,
one mobile moves toward the base while another moves to the cell
edge. The other extreme scenario occurs when all but one user have
very small fading variances ck . Under the normalization above, this
leads to the variances c2 = c3 = · · · = cK = 0 and c1 = K.
Without loss of generality, we order the users in a decreasing way
according to their fading variances, i.e., c1 ≥ c2 ≥ · · · ≥ cK . The con-
straint regarding the sum of the fading variances verifies that we com-
pare scenarios in which the average of the channels carry the same sum
power.
Fig. 4.8 Fair comparison of user distributions with K = 8 users. (a) Symmetric scenario
c1 = (1, 1, . . . , 1). (b) One mobile moves to the base and another to the cell edge. The sum
of their fading variances stays constant. c2 = (1 + α, 1, 1, . . . , 1 − α). (c) All but one mobile
at the cell edge c3 = (8, 0, . . . , 0).
Definition 4.4 (More spread out user distribution). A user dis-

tribution c1 is called more spread out than a user distribution c2 if the
fading variance vector c1 is majorized by c2 , i.e., c1 c2 .
Remark 4.3. In Figure 4.8, the symmetric scenario is majorized by

all other scenarios and scenario three majorizes all other scenarios, i.e.,
c1 c2 c3 .
Remark 4.4. Note, that the measure of user distribution and the mea-
sure of spatial correlation can be combined for, e.g., multiuser MIMO
systems. The channel of a user k can be modeled for all 1 ≤ k ≤ K
under the Kronecker model assumption from Section 4.1.1 as
1/2 1/2
H k = ck RR,k W k RT,k
with normalized transmit and receive correlation matrix as well as nor-
malized random matrix, i.e., for all 1 ≤ k ≤ K tr RT,k = nT , tr RR =
nR , and EW k = 1. Then, the long-term fading is captured by ck , the
spatial correlation by RT and RR and the rich multi-path environment
by W .
In the following subsections, some examples for application of the

measure of user distribution are provided.
4.2.2 Average Sum Rate in MAC

There are K mobile users who are going to transmit data to a base
station. The flat-fading channels h1 , . . . , hK between the mobiles and the
base are complex Gaussian iid (CN (0, I)). The additive white Gaussian
noise n(t) at the base receiver has variance σn2 . Furthermore, we assume
that the sum transmit power is constrained to be Psum . The SNR is
given byρ = Pσsum
2 . The received signal at the base is given by
n
K
X
y(t) = hk xk (t) + n(t). (4.40)
k=1
The statistic of the fading channel coefficient is completely charac-

terized by their second moment, i.e., ci = E (h∗i hi ) for 1 ≤ i ≤ K. The
transmit power directly corresponds with the variance of the trans-
mit signals, pi = E (x∗i xi ) for 1 ≤ i ≤ K. It follows from the trans-
mit power constraint that the l1 -norm of the power allocation vector
p = [p1 , . . . , pK ] is constrained to be one, i.e., ||p|| = K
P
k=1 pk = P = 1.
The average sumrate is a function of the SNR ρ and the fading
variance distribution c1 , . . . , cK . The ergodic sum capacity of the SISO
MAC with perfect CSI at the base and the mobiles is given by
CpCSI (ρ, c) = E (log [1 + ρ max (c1 w1 , . . . , cK wK )]) . (4.41)
The optimum transmission strategy is to allocate power only for the

best user, i.e., the user with maximum ci wi .
Theorem 4.9. For perfect CSI at the mobiles, only the best user is
allowed to transmit at one time. The average sum rate in (4.41) is
Schur-convex function w.r.t. the fading variance vector c.
The proof follows from Theorem 2.17.

The ergodic sum capacity of the SISO MAC with perfect CSI at the
base and long-term CSI at the mobiles in terms of c1 , . . . , cK is given by
K
!
P
X
CcfCSI (ρ, c) = max E log 1 + ρ ck pk wk . (4.42)
K
i=1 pi =1 k=1
pi ≥0 ∀1≤i≤K
The solution to the programming problem in (4.42) can be found in

[15]. Note that usually one major difference between the uplink and
downlink is in regards of the power constraints.
Theorem 4.10. For mobiles which know the fading variances and per-
fect CSI at the base, the average sum rate in (4.42) increases with less
spread out fading variances c, i.e., the average sum rate in (4.42) is a
Schur-convex function w.r.t. the fading variance vector c.

Finally, the ergodic sum capacity of the SISO MAC with perfect
CSI at the base and no CSI at the mobiles is given by
K
!
X
CnoCSI (ρ, c) = E log 1 + ρ ck wk . (4.43)
k=1
Theorem 4.11. For uninformed mobiles and perfect CSI at the base,
the average sum rate in (4.43) increases with more spread out fading
variances c, i.e., the average sum rate in (4.43) is a Schur-concave
function w.r.t. the fading variance vector c.
The proof follows from Lemma 2.16.

In the downlink the base station has a sum power constraint on
all users signals whereas in the uplink individual power constraint are
applied. In order to take intercell interference into account additional
sum power constraints occur also in the uplink [61]. This provides the
motivation for the optimization problems in (4.41–4.43).
4.2.3 Average Sum Rate in BC

In the signal model, there are K mobile users who are going to
receive data from one base station. The single-antenna quasi-static
block flat-fading channels h1 , . . . , hK between the mobiles and the base
are modeled as constant for a block of coherence length T and from
block to block as zero-mean independent complex Gaussian distributed
(CN(0, ck )). The variance is ci = E (h∗i hi ) for 1 ≤ i ≤ K. The additive
zero-mean white Gaussian noise nk (t) at the each receiver is iid and has
variance σn2 . Furthermore, we assume that the sum transmit power is
constrained to be P . The SNR is given by ρ = σP2 . The received signal
n
at mobile k at time t is
K
X
yk (t) = hk xl (t) + nk (t).
l=1
We omit the time index for convenience. The statistics of the fading
channel coefficients hi are completely characterized by ci . The trans-
mit power directly corresponds to the variance of the transmit signals
pi = E (x∗i xi ) for 1 ≤ i ≤ K. The l1 -norm of the power allocation vec-

tor p = [p1 , . . . , pK ] is constrained to be one ||p|| = K
P
k=1 pk = P = 1.
2
For 1 ≤ k ≤ K define wk by ||hk || = ck wk , i.e., wk are iid standard
exponential distributed random variables. We assume that the receivers
have perfect CSI. Furthermore, we collect the channel states in a vector
h = [h1 , . . . , hK ].
The next Lemma yield the average sum capacity and average sum
rate expressions for the three CSI scenarios considered. The proofs can
be found in [72].
Lemma 4.12. The sum rate with perfect CSI at the base station is
achieved by TDMA. The optimal power allocation is to transmit into
direction of the best user l with ||hl ||2 > ||hk ||2 for all 1 ≤ k ≤ K and
l 6= k. The ergodic sum capacity is then given by
CpCSI (ρ, c) = E log 1 + ρ max ||h1 ||2 , . . . , ||hK ||2 .

(4.44)
The optimal transmit strategy to achieve the average sum capacity

with long-term CSI is TDMA. Only the user with highest channel vari-
ance ck is allowed to transmit. The achievable average sum capacity is
given by
CcCSI (ρ, c) = E log(1 + ρc1 w1 ). (4.45)
For no CSI at the base, the most robust transmit strategy against
worst case user distribution is equal power allocation and the ergodic
sum rate2 is given by
K
!
X
CnoCSI (ρ, c) = E log 1 + ρ ck wk . (4.46)
k=1
Next, let us characterize the impact of the spread of the fading vari-
ances on the ergodic sum capacity for the cases with perfect, covariance
and on the ergodic sum rate with no CSI at the base.
2 Sincethe optimal transmit strategy for no CSI is motivated by a compound channel
approach, we cannot talk about the sum capacity. Instead we use the term sum rate.
Theorem 4.13. Assume perfect CSI at the mobiles. For perfect CSI at
the base, the ergodic sum capacity in (4.44) is a Schur-convex function
w.r.t. the fading variance vector c. For a base which knows the fading
variances, the ergodic sum capacity in (4.45) is a Schur-convex function
w.r.t. the fading variance vector c. For an uninformed base station, the
ergodic sum rate in (4.46) is a Schur-concave function w.r.t. the fading
variance vector c.
The proof and illustrations can be found in [72]. The proof is based
on Lemma 2.16 and Theorem 2.17.
4.2.4 Sum Rate Related Measures

Consider the instantaneous sum rate with scheduling policy p(h)
K
!
X
C(α) = C(h, ρ) = log 1 + ρ pk (h)||hk ||2 . (4.47)
k=1
The instantaneous sum rate depends on the deterministic SNR and

on the channel which is a random variable. That means the instanta-
neous sum rate is also a random variable (indicated by α). In the block
fading model, the channel is constant for the coherence time T . It is
assumed that the coherence time T is large enough to code over many
blocks in order to achieve almost the mutual information. Then the
mutual information in (4.47) has its usual meaning as the instantaneous
capacity [10].
Since the scheduling policy depends on the channel state, it could
also vary randomly from fading block to fading block. As a result, the
instantaneous capacity itself is a random variable and has a pdf pC (α).
The average of the random variable
Z ∞
E[C(α)] = αpC (α)dα
0
is the average sum rate. For single user systems with perfect CSI, it is
called ergodic capacity [95]. In multiuser systems with perfect CSI we
can call it ergodic sum capacity and it describes the overall performance
of the system in average. For finer analysis, the cdf of C is important.

It is the outage probability of the channel, i.e.,
Z R
Pr[C(α) < R] = pC (α)dα.
0
The outage probability gives the probability that a certain sum rate
R cannot be achieved for a channel state. The system is in an outage
means we cannot guarantee to successfully deliver any information at
the sum rate R during this channel state.
The feasible delay can be exploited either by increasing the length of
one codeword or by introducing some kind of automatic repeat request
(ARQ). If the block length of the codeword is increased, the outage
probability for this codeword is reduced. Here, following [2], we con-
sider the “Maximum Zero-Outage Throughput.” The receiver requests
a retransmission as long as outages occur until the codeword is success-
fully decoded. Therefore, the complete information is reliably transmit-
ted. The maximum throughput for this simple retransmission scheme
is given by
T (SNR) = max R (1 − Pr [C(h, SNR) ≤ R]) . (4.48)
R≥0
In [2] the quantity in (4.48) is called “Maximum Zero-Outage Through-

put” (compare also to [7]).
For the delay-constraint analysis, the ergodic capacity as well as the
outage probability and the maximum throughput are not suitable. Both
approaches do not guarantee the successful transmission of information
in a finite number of blocks. Therefore, we restrict the delay to one
fading block and fix the outage probability to some , i.e., Pr[C(α) <
R] = . Then we solve this for R. In order to avoid outages at all, we
set = 0 and obtain the zero-outage sum rate, or the delay-limited sum
rate R∗ with
Pr[C(α) < R∗ ] = 0.
4.2.4.1 Outage Sum Rate

If the users are not equally distributed and the information is not
available at the base station, it can be shown by a compound chan-
nel approach that equal power allocation across all users is optimal.
Furthermore, the impact of the user distribution on the outage sum
rate is characterized in the following theorem.
Theorem 4.14. Assume that the base station is uninformed and the
user distribution is according to c. For fixed transmission rate R and
R
for SNR ρ < ρ = 2 2−1 , the sum outage probability is a Schur-concave
function of the user distribution c1 , . . . , cK , i.e., a less equal distribution
of users decreases the sum outage probability. For SNR ρ > ρ = 2R − 1,
the sum outage probability is a Schur-convex function of the user dis-
tribution c1 , . . . , cK , i.e., a less equal distribution of users increases the
sum outage probability.
Theorem 4.15. With perfect CSI at the base, the optimal scheduling
is TDMA and the outage probability is given by
Pr max[||h1 ||2 , . . . , ||hK ||2 ≤ z .

(4.49)
For fixed sum rate R and SNR

2R − 1
ρ ≤ ξˆ = (4.50)
K
the sum outage probability is Schur-concave with respect to c and for
SNR
2R − 1
ρ ≥ ξ¯ = (4.51)
min ck
1≤k≤K
the sum outage probability is Schur-convex.
Proof. In order to verify Schur’s condition, note that the outage prob-
ability can be written as
K
pCSI
Y
Pout (c) = (1 − exp(−z/ck ))
k=1
which is obviously a symmetric function with respect to c. The differ-

pCSI
ence of the first derivatives of Pout (c) with respect to c1 and c2 is
given by
K
Y 1
∆(c) = (1 − e−z/ck )e−z/c1 −z/c2
c21 c22
k=3
h i
· (1 − e−z/c1 ez/c1 c21 ) − (1 − e−z/c2 ez/c2 c22 )
K
Y
= (1 − e−z/ck ) (g(z, c1 ) − g(z, c2 ))
k=3
with g(z, c) = −z/c2 exp(−z/c)(1 − exp(−z/c)). The sign of ∆(c)

depends on the monotony properties of the function g with respect
to c. The sign of the first derivative of g with respect to c depends on
z
sign 1 − exp − (c − z).
c
If z > max ck , the first derivative of g with respect to c is negative
and the outage probability is Schur-concave. If z < min ck , the first
derivative of g with respect to c is positive and the outage probability
is Schur-convex. Since max ck ≤ K, the inequality in (4.50) and (4.51)
follows.
4.2.4.2 Further Sum Rate Performance Measures

If stringent delay requirements are relaxed and an arbitrary delay
allowed, the delay can be exploited either by increasing the length of
one codeword or by introducing some kind of ARQ. If the block length
of the codeword is increased, the outage probability for this codeword is
reduced. Here, following [2], we focus on the “Maximum Zero-Outage
Throughput.” The receiver requests a retransmission as long as out-
ages occur until the codeword is sucessfully decoded. Therefore, the
complete information is reliably transmitted. The probability that a
codeword has to be transmitted s times is given by
"s−1
s−1
#" !#
\ \
Pr outi 1 − Pr outs outi , (4.52)

i=1 i=1
where outs means an outage in retransmission s. If the receiver con-

siders only the actual packet for decision, the probability of s times
transmission is given by
s−1
Y h i h i
Pr kth outage · 1 − Pr sth outage (4.53)
k=1
under the assumption of an independent block-fading channel. The

maximum throughput is then defined to be
R
T (ρ, P ) = sup
R E[S]
with the average service time E[S]. Using (4.53), the maximum through-
put for this simple retransmission scheme is given by
T MZT (ρ, P ) = max R (1 − Pr [I(ρ, P, h) ≤ R]) . (4.54)

R≥0
In [2] the quantity in (4.48) is called “Maximum Zero-Outage Through-

put.” Here, a transmission of a certain amount of data is not guaranteed
within a limited delay. The measure in (4.48) is contrary to the delay-
limited capacity in the last section. There are channel realizations in
which more bits are reliably transmitted than T MZT (ρ, P ) and there
are realizations in which less bits are transmitted. The probability that
T MZT (ρ, P ) bits (out of R bits) are transmitted without errors is given
by Pr I(ρ, P, h) ≤ T MZT (ρ, P ) .

The connection between the delay-limited capacity and the max-

imum throughput becomes clear for rates that are smaller than the
delay-limited capacity. In this case, the optimal power allocation which
minimizes the outage probability corresponds to the optimal power
allocation in the delay-limited case. Therefore, the achievable rates are
equal, too.
Even in the simplest setting the optimization problem in (4.54) leads
to a complicated solution containing again the Lambert-W function [7].
Table 4.1 shows the maximum throughput for equally distributed
users c = 1 for perfect CSI and no CSI. The multiuser diversity that
stems from the fact that the best user is exclusively scheduled can
clearly be observed.
Table 4.1 Scaling of maximum throughput with perfect and no CSI at the base at SNR
0 dB.
CSI\M 1 2 3 5 10 20
pCSI 0.264 0.523 0.753 1.132 1.785 2.5265
noCSI 0.264 0.417 0.521 0.664 0.861 1.0521
Next, the strict zero-outage or delay-limited sum rate is studied.

The delay-limited sum rate R0 is defined by

2
Pr p(h) max ||hk || ≤ z = 0. (4.55)
1≤k≤K
The delay-limited sum rate R0 is given by

ρ
R0 = log 1 +
β
h i
with β = E max 1 ||h ||2 . The expectation as a function of the user
1≤k≤K k
distribution c is simplified to
Z ∞ Y K
1 t
β(c) = 1 − exp − dt.
0 t2 ck
k=1
Corollary 4.7. The delay-limited sum rate R0 is Schur-concave with

respect to c, i.e., the less spread the users are, the higher is R0 .
Proof. The proof is an application of Theorem 2.17. The function

β(c) is the expectation of a convex and monotonic decreasing func-
tion f (x) = x1 which has the maximum of set of iid weighted standard
exponential random variables. The delay-limited sum rate R0 depends
on the inverse of β, therefore it is Schur-concave whenever β(c) is Schur-
convex. Solely, the technical issues in the theorem must be verified.
Note that P (x) = nk=1 (1 − exp(−x/µk )) and f (x) = 1/x. The first
Q
limit exists because

1
lim (1 − exp(−x)) = 1.
x→0 x
The second limit exists because

1
lim (1 − exp(−x)) = 1.
x→∞ x
Finally, the third limit also exists because

Z ∞
− 1/x2 (1 − exp(−x))2 dx = −2 log(2).
0
Remark 4.5. From the proof of Corollary 4.7, it follows that there
must be at least two users in the cell, otherwise the delay limited sum
rate is zero.
5
Application of Matrix-Monotone Functions
5.1 Generalized Multiple Antenna Performance Measures

Multiple-antennas can improve the spectral efficiency and reliability in
wireless communications systems. In recent years, it was discovered that
MIMO systems have the ability to reach higher transmission rates than
one-sided array links [137, 158]. First, we review recent results for MISO
systems since this case has recently gained much attention. In [157], the
potential of multiple antenna systems was pointed out. The capacity of
a MISO system with imperfect feedback was first analyzed in [149] and
[99, 100]. In [55, 63], the optimum transmission strategy with covariance
knowledge at the transmit array with respect to the ergodic capacity
was analyzed. In [21, 111, 121], the problem of downlink beamforming
problem in MISO systems was solved. In [54], the ergodic capacity in
the non-coherent transmission scenario with only covariance knowledge
at the transmitter and the receiver, is studied. Many results regarding
the capacity of MISO and MIMO systems under different levels of CSI
and the corresponding transmission strategies are recently published
[40].
It has been shown that even partial CSI at the transmitter can
increase the capacity of a MISO system. Recently, transmission schemes
103
104 Application of Matrix-Monotone Functions in Wireless Communications
for optimizing capacity in MISO mean-feedback and covariance-

feedback systems were derived in [149, 99]. The capacity can be
achieved by Gaussian distributed transmit signals with a particular
covariance matrix. In a block-fading model, the general signal process-
ing structure which achieves capacity independent of the type of CSI
consists of a Gaussian codebook, a number of beamformers and a power
allocation entity [9, 149]. Additionally, it was proved that the optimal
transmit covariance matrix in the covariance feedback case has the same
eigenvectors as the known channel covariance matrix. The complete
characterization of the impact of correlation on the ergodic capacity in
MISO systems can be found in [68]. In addition to the capacity other
performance metrics like the MMSE were analyzed in the literature,
e.g., [117, 120]. The multiuser MIMO system optimization is performed
with respect to sum MSE in [129], with per-user MMSE requirements
in [128].
The analysis and design methodology of single-antenna and beam-
forming systems was extended and generalized to multiantenna sys-
tems. Many novel approaches and techniques were developed and a
unmanageable bulk of papers, reports, and books were produced. How-
ever, some ideas occurred inherently as persistent concepts in many
works. The main goal of this section is to detect these main concepts
and express them on a meta-level by constructing a unified framework.
In order not to have different statements for different performance
metrics, we present here a unifying framework in which a class of func-
tions serves as the performance metric. The underlying mathematical
structure is described by the representation of Löwner from “Matrix-
Monotone Function.” Let us start with some motivating and illustrative
examples.
5.1.1 Examples in Single-User MIMO Systems

Consider the quasi-static block-flat fading MIMO system in Figure 5.1.
The transmit signals are complex Gaussian distributed random vec-
tors with zero mean and transmit covariance matrix Q. The transmitter
structure that corresponds to this type of signaling is described as
follows: The transmit covariance matrix is given by Q = E xxH .

Fig. 5.1 Single-user MIMO system.
Using the eigenvalue decomposition of Q = U Q ΛQ U H Q , it becomes

obvious how one can construct a particular transmit covariance matrix.
The input data stream d(k) is split into m parallel data streams
d1 (k), . . . , dm (k). Each parallel data stream is multiplied by a factor
√ √
p1 , . . . , pm and then weighted by a beamforming vector u1 , . . . , um ,
respectively. The number of parallel data streams is less or equal to
the number of transmit antennas (m ≤ nT ). The beamforming vectors
have size 1 × nT with nT as the number of transmit antennas. The
√
nT signals of each weighted data stream xi (k) = di (k) · pi · ui are
Pm i
added up x(k) = i=1 x (k) and sent. By omitting the time index k
for convenience we obtain in front of the transmit antennas
m
X √
x= dl · pl · ul . (5.1)
l=1
The transmit signal in x has a covariance matrix Q with eigenvalues

p1 , . . . , pm , 0, . . . , 0 and eigenvectors u1 , . . . , um . In order to construct a
transmit signal with a given covariance matrix, two signal processing
steps are necessary: the power control p1 , . . . , pm and the beamform-
ers u , . . . , um . The sum transmit power nk=1
P T
pk is constrained, i.e.,
PnT 1
k=1 k p = P .
The first performance measure is the ergodic capacity [10], i.e., the
rate that can be transmitted reliable over ergodic (infinite) many chan-
nel realizations by codes with very long (infinite) block length.1 For
1 Sincewe fix the transmit covariance matrix, the term capacity can be confused. Usually, the
capacity is the ultimate rate that is achieved by optimization of the transmitter including
the covariance matrix. However, we think of the linear precoding matrix Q as fixed and
talk about the capacity for this fixed precoding Q.
the system in Figure 5.1 with nT transmit and nR receive antennas

and n = min(nT , nR ), m = max(nT , nR ) it is given by
C(ρ, Q) = EH log det I + ρHQH H

Yn
1 + ρλk (HQH H )

= EH log
k=1
n
X
= EH log(1 + ρλk (HQH H ))
k=1
= EH tr log I + ρHQH H

(5.2)
with SNR ρ = σ12 , channel matrix H, and expectation with respect to

n
H. The channel matrix is zero-mean iid Rayleigh distributed. A very
important property that has been used many times is the invariance
property of the channel statistics, i.e., left and right multiplication
of H with an unitary matrix U [94]. Furthermore, the function
in (5.2) is obviously monotone increasing in ρ, in tr Q for fixed
Q
tr Q and also in Q, i.e., the function inside the trace is matrix-
monotone with respect to Q. Further on, the function is concave with
respect to Q.
Next, consider the slightly modified system in Figure 5.2. In addi-
tion to the additive white Gaussian noise, another additive noise
with colored covariance matrix is added. That can correspond to
either intra- or inter-cell interference or to any other type of jamming
signal.
Fig. 5.2 Single-user MIMO system with interference.

The ergodic capacity for the system in Figure 5.2 is given by

 
1 
C(ρ, Q, Z) = E log det  I + Z +HQH H  − log detZ̃
 
ρ 
| {z }
Z̃
−1/2 −1/2
= E tr log I + Z̃ HQH H Z̃ . (5.3)
In (5.3), Z̃ is the positive definite noise plus interference matrix. Note

that the function in (5.3) is concave in Q and convex in Z.
Next, consider a completely different performance measure, the
normalized minimum mean-square error (MMSE). The linear MMSE
receiver reduces the computational complexity at the receiver side. The
MSE can be evaluated for each fading state and each symbol. The
average MSE described the quality of data transmission. If we apply
the linear MMSE receiver, the performance metric changes from the
average mutual information to the normalized MSE [150]. In general,
the Wiener filter for a linear system y = ax + n can be described as
w = Rxy R−1 yy . The reason, why we speak about the normalized MMSE
is that there are two cases for deriving the MMSE. In the first case,
the actual transmit signal x is considered. In the second case, the
source signal before linear precoding is considered. In the first case, the
resulting weight w or the resulting MSE expression must be normal-
ized with the transmit covariance matrix. However, both approaches
lead to the same result. We consider the first approach: The linear
MMSE receiver weights the received signal vector y by the Wiener
filter
−1
x̂ = ρQH H Z̃ + ρHQH H

y. (5.4)
The covariance matrix of the estimation error R is given by
R = EH (x̂ − x) (x̂ − x)H

−1
= Q − QH H Z̃ + ρHQH H

HQ. (5.5)
The average normalized sum MSE is defined as the trace error covari-
ance matrix of the estimation error in (5.5) [49, 65]
MSE(σn2 , Q, Z) = EH tr Q−1/2 R Q−1/2

−1
= nT − EH tr ρHQH H Z̃ + ρHQH H

−1/2 −1/2
= nT − EH tr ρZ̃ HQH H Z̃
−1/2 −1/2 −1

HQH H Z̃

· I + ρZ̃ (5.6)
and its average over channel realizations is called average sum MSE.
Note, that the MSE is convex in Q and concave in Z.
In SISO systems, the relationship between the rate, the SNR, and
the MSE is quite simple, i.e., C = log(1 + SNR) = log(1/MSE). In
MIMO systems, the connection is more complicated due to the spatial
dimension, e.g., the pairwise error probability (PEP) between X and
X̂ can be upper bounded by P (X → X̂) ≤ exp(−ρ||H(X − X̂)||2 ).
The connection between the performance measure mutual information
and MSE is highlighted in the next subsection.
5.1.2 Relationship between MSE and Mutual Information

There are at least three connections between the MSE and the capacity
that provide intuitive insights into the meaning of these performance
metrics. The first is a function theoretic relationship, the second is
a direct relationship by linear algebra, and the third is an analytical
relationship following from the second one.
5.1.2.1 Function Theoretic Relationship

Let us compare the capacity and the normalized sum MSE. First, we
rewrite the capacity as
C = tr log I + ρZ −1/2 HQH H Z −1/2 .

The capacity expression is the trace of a matrix valued function. Let us

denote the matrix valued function as Φ1 (X) = log (I + X). Then the
capacity can be written as
C = tr Φ1 (ρZ −1/2 HQH H Z −1/2 ). (5.7)
Let us turn to the sum MSE expression. It can be written using the
matrix-valued function Φ2 (X) = I − X [I + X]−1 = [I + X]−1 as
MSE = tr Φ2 (ρZ −1/2 HQH H Z −1/2 ). (5.8)
This leads directly to the next theorem.
Theorem 5.1. Both performance metrics, the instantaneous mutual

information and the MSE can be written in the following generalized
form
Φ(Z, Q, H) = tr φ ρZ −1/2 HQH H Z −1/2

(5.9)
with a matrix-monotone function φ.
Proof. It has already been shown in Section 3.2 that log(1 + x) and
1
1+x are matrix-monotone functions.
5.1.2.2 Linear Algebraic Relationship

The normalized MSE matrix is defined in (5.5). Denote the receive
covariance matrix as Ry = E[yy H ] = I + HQH H and the cross-
covariance matrix Rxy = E[xy H ] = QH H .
Compute the inverse of the error covariance matrix of the MSE in
(5.5) using the Matrix Inversion Lemma in (6.1) to obtain [132]
H −1
R−1

= Q − Rxy Ry Rxy
−1 H −1
= Q−1 + Q−1 Rxy R−1 H −1

y − Rxy Q Rxy Rxy Q
= Q−1 + H H Z −1 H
The capacity can be written as
C = log det I + Z −1 HQH H = log det I + H H Z −1 HQ

−1 H −1
detQ
= log detQ + log det Q + H Z H = log .
detR
5.1.2.3 Analytical relationship

It is easily observed that by taking the first derivative of the capacity
with respect to the ρ, the following expression closely related to the
normalized sum MSE is obtained, i.e.,
∂ log det Z + ρHQH H

∂C(ρ)
=
∂ρ ∂ρ
−1
= tr Q H Z + ρHQH H
1/2 H
HQ1/2 .

This observation is generalized into the following important result. Note

that the MMSE estimator in [43] tries to estimate Hx instead of only x.
Therefore, the expression differs from the MMSE in (5.6).
Theorem 5.2 (Theorem 2 in [43]). Denote the MMSE as

mmse(ρ) = E ||Hx − H x̂||2 where x̂ is the conditional mean esti-

mate. Then with independent AWGN vector n

d √
I(x; ρHx + n) = mmse(ρ) (5.10)
dρ
This holds for all random variables x with finite variance.
Note that the gradient of the mutual information with respect to

different parameters in linear vector Gaussian channels is studied in
[108].
5.1.3 Examples in Multi-User MIMO Systems

5.1.3.1 Introduction and Motivation
In this subsection an overview of the recent results are given. It is
impossible to collect and refer to all relevant work in the multiuser
multi-antenna research area. Therefore, only a fraction of the interest-
ing work is cited and the interested reader is referred to the references
in these papers to obtain more information.
In multiuser scenarios, the application of new services which cre-
ate elastic traffic is on the rise. Therefore, it is necessary, to study
the complete point-to-multi-point (downlink) and multi-point-to-point
(uplink) transmission system. The MIMO MAC model appears in the

uplink transmission from multiple user to the multi-antenna base sta-
tion. Each user is equipped with multiple antennas.
One important performance metric analyzed is the throughput of
such multiple user systems under either individual power constraints
in the uplink or a sum power constraints for the downlink transmis-
sion. In [112], the ergodic sum capacity of a multi-antenna Gaussian
MAC is defined, and the impact of the number of transmit and receive
antennas, as well as the number of users is analyzed. The special case
in which covariance information is only available at the mobiles is con-
sidered in [56]. In [113], the ergodic capacity, the ergodic sum rate
region, the outage capacity, and algorithms for the vector MAC are
analyzed. In [13, 164], the authors maximize the ergodic sum capac-
ity of the MIMO MAC for fixed individual power constraints for the
transmit covariance matrices. It was shown that the optimal trans-
mit covariance matrices are characterized by an iterative water-filling
solution which treats the other users like noise under individual power
constraints.
The system model, which is dual to the MIMO MAC, is the mul-
tiuser MIMO downlink transmission. It leads to the BC [31, 32], the
multi-antenna Gaussian non-degraded BC [26], or vector broadcast
channel [163]. Recently, the sum capacity and achievable region of the
multiuser MIMO BC was studied in [147, 146] and [152]. An upper
bound on the capacity region of the BC was derived in [119]. The
bound is found by computing the capacity of the cooperative sys-
tem under worst case noise. The structure of the worst case noise
for the MIMO BC is analyzed in [161]. In [22], [140], [146], and
[162], the duality between the multiuser uplink and downlink chan-
nel was studied. It was shown that the achievable capacity region of
the downlink transmission collapses with the capacity region of the
uplink. In addition to this, the maximum sum rate point on the enve-
lope of the capacity region can be characterized by the capacity of
the equivalent cooperative MIMO system with the worst case noise
[119]. The capacity region of the non-degraded Gaussian BC is not
an open research problem anymore. In [148], the authors show that
all points below their proposed upper bound are achievable under the
assumption that Gaussian code books are optimal. This assumption

has been verified in [153] and it has been shown that the capac-
ity region of the Gaussian MIMO BC equals that of the Gaussian
MIMO MAC.
In addition to the throughput another relevant performance metric
is the MMSE in the uplink transmission when the base station applies
a linear MMSE receiver. The resulting sum MSE has been studied
in [67]. Both sum performance measures are discussed individually in
detail in [60]. Some preliminary results can be also found in [59, 126].
The solution to the sum transmit power minimization with per-user
MMSE requirements is provided in [123, 127, 128].
5.1.3.2 MIMO MAC

Consider the MIMO MAC. The communication channel between each
user and the base station is modeled by a block-flat fading MIMO chan-
nel. We have K mobiles with nT antennas each. We can easily extend
the results to the case in which every mobile has a different number of
transmit antennas. The base station owns nR receive antennas. In the
discrete time model, the received vector y at any one time at the base
station can be described by
K
X
y= H k xk + n (5.11)
k=1
with the receiver noise n ∈ CnR ×1 which is AWGN, flat fading channel
matrices H k ∈ CnR ×nT , and transmit signals xk ∈ CnT ×1 . We assume
uncorrelated noise with covariance σn2 I nR . The inverse noise power
is denoted by ρ = σ12 . Equation (5.11) can be rewritten in compact
n
form as
y = Hx + n (5.12)
with H = [H 1 , H 2 , . . . , H K ] and x = [xT1 , . . . , xTK ]T . We collect the

transmit covariance matrices in a large block-diagonal matrix Q =
Diag (Q1 , Q2 , . . . , QK ). The sum capacity of the MIMO MAC with SIC
applied at the base station is given by [163]

K
!
X
C(Q, H, ρ) = log det I + ρ H k Qk H H
k
k=1
K
!
X
= tr log I + ρ H k Qk H H
k . (5.13)
k=1
Next, consider a linear multiuser MMSE receiver and follow the deriva-
tion in the single-user case closely (see Section 5.1.1). Define the nor-
malized MSE as
nR
−1/2 −1/2
X µi
MSE(Q, H, ρ) = tr (Q K Q ) = KnT − 2
σn + µi
i=1
nR
X 1
= KnT − nR + σn2
σ 2 + µi
i=1 n
σn2 tr A−1

= KnT − nR + (5.14)
with µi as the eigenvalues of HQH H and the matrix A as
K
X
A = σn2 I + H k Qk H H
k . (5.15)
k=1
The MSE is reduced by minimizing the sum in the RHS of (5.14). It is

worth mentioning that the term ni=1
P R 1
2 +µ is a Schur-convex function
σn i
with respect to the µi [92]. Therefore, the term is minimized if all
eigenvalues µi are equal.
5.1.3.3 MIMO BC
Next, we study the downlink transmission from the base station to the
mobiles. The base station is equipped with nT transmit antennas and
each mobile has nR antennas. The channel matrices in the downlink
transmission correspond to the Hermitian channel matrices from the
uplink, i.e., H dl H
i = H i (reciprocity).
The received vector y k at each mobile k can be written as
K
X
yk = H H
k xk + HH
k xl + nk (5.16)
l=1,k6=l
with the flat-fading channel matrices H k , the AWG receiver noises nk ,

and the transmit signal xk which is intended for mobile k. The noise
at the mobiles is assumed uncorrelated and independent identically
distributed. In Equation (5.16) the first term isthe signal for user k,
the second term is the interference from the signals for the other users,
and the last term is the noise.
The sum capacity of the MIMO BC with Costa Precoding at the
base station is equal to the sum capacity of the MIMO MAC and it is
given by [147]
K
!
X
C(Q, H, ρ) = log det I + ρ H k Qk H H
k . (5.17)
k=1
5.1.4 Generalized Sum Performance Measure

Both sum performance measures (5.13), (5.14) for downlink and (5.17)
for uplink can be written as
K
!
X
Φ(Q, H, ρ) = tr φ ρ H k Qk H Hk . (5.18)
k=1
The function in (5.18) is jointly concave in the set of transmit covari-

ance matrices Q1 , . . . , QK . This follows from the fact that the function
depends on the weighted sum of covariance matrices
K
!
X (1) (2)
tr φ H k ((1 − λ)Q̃k + λQ̃k )H H k
k=1
K K
!
X (1) X (2)
= tr φ (1 − λ) H k Q̃k H H
k +λ H k Q̃k H H
k
k=1 k=1
= tr φ ((1 − λ)A + λB) (5.19)
with 0 ≤ λ ≤ 1 and fixed A and B. The function is concave, since φ is
matrix-monotone.
5.2 Optimization of Matrix-Monotone Functions

The performance measures derived and described in the last section can
be represented as the trace of some matrix-monotone function which
depends on either a single quadratic form ρZ −1/2 HQH H Z −1/2 or a

sum of quadratic forms of ρ K
P
k=1 H k Qk H k . Depending on the type
of optimization problem the expectation is taken with respect to H or
H 1 , . . . , H K where H is distributed according to the Kronecker model
from Section 4.1.1.
5.2.1 Single Quadratic Form

At first, we list some properties of the performance function. Let U be
a unitary matrix and A, B two positive semidefinite matrices. Both of
dimension n × n. Furthermore H is in general a rectangular matrix of
dimension n × m.
tr φ U AU H = tr φ (A)

(5.20)
H H

tr φ HH = tr φ H H (5.21)

tr φ A1/2 BA1/2 = tr φ B 1/2 AB 1/2 . (5.22)
Property (5.20) follows from the definition of the matrix-valued func-

tion φ. (5.21) follows from the fact, that φ acts only on the eigen-
values and the eigenvalues of HH H are equal to the eigenvalues
of H H H. The last property (5.22) follows from tr φ(B 1/2 AB 1/2 ) =
tr φ(B 1/2 A1/2 A1/2 B 1/2 ) = tr φ(A1/2 BA1/2 ) by property (5.21).
Theorem 5.3. Consider the following optimization problem with diag-

onal positive Γ = diag[γ1 , . . . , γn ]
max tr Γφ ρHQH H

s.t. tr Q ≤ P. (5.23)
Q0
Decompose H according to SVD as H = U DV H . Denote the

eigenvalues of H H H by d1 , . . . , dn . Then the optimal Q has eigenvalue
decomposition Q = V ΛV H with eigenvalues Λ = diag(p1 , . . . , pn ) char-
acterized by
+
1 0 µ
pk = φ̃
ρdk ρdk γk
with x+ = max(x, 0) and φ̃0 as the inverse function of the first derivative
of φ. µ is chosen such that nk=1 pk = P .
P
The first part of Theorem 5.3 follows from Theorem 3.16. The
second part uses the optimality conditions that are described in
Appendix 6.2.3. The complete and detailed proof and illustrations for
the case Γ = I can be found in [20].
Theorem 5.4. Consider the following optimization problem

max E tr φ ρHQH H s.t. tr Q ≤ P

(5.24)
Q0
under the assumption that H is distributed according to the model

1/2 1/2
in Section 4.1.1, i.e., H ∼ RR W RT . Q is to be fixed for different
realizations of W but may depend on RR and RT . Denote and order
the eigenvalues of RT by λT1 ≥ · · · ≥ λTn ≥ 0. The optimal covariance
matrix Q has eigenvectors which commute with the eigenvectors of
RT and eigenvalues p = [p1 , . . . , pn ] that are characterized for all 1 ≤
k ≤ n by
n
" ! #
H 1/2 0 T 1/2 H 1/2 1/2
X
T
ρλk E wk RR φ ρ pl λ l R R w l w l R R RR w k
l=1
(
=µ pk > 0
(5.25)
≤µ pk = 0
Pn
for some µ > 0 such that k=1 pk = P.
One proof can be found in [71]. The first part follows from invari-
ance properties of W and the second part follows from the optimality
conditions that are described in Appendix 6.2.3.
Remark 5.1. On the one hand, there is a closed form solution for
the optimal eigenvectors that solve (5.24), but on the other hand,
the optimal eigenvalues that solve (5.24) are only given in an indirect
form. However, the properties of the optimal eigenvalues are analyzed
based on the implicit characterization. Depending on the parameter ρ
in (5.24), the number of eigenvalues greater than zero is determined,
i.e., the higher ρ the more eigenvalues of Q are greater than zero. This
leads to the question where the transitions are, e.g., for which ρ there
is only one eigenvalues of Q greater than zero, i.e., Q has rank one.
Corollary 5.1. The optimization problem (5.24) has a rank one solu-
tion if and only if
h i
1/2 1/2
λ2 E tr RR φ0 RR w1 wH 1 RR
h i
1/2 0 1/2 H 1/2 1/2
≤ λ1 E w H
1 RR φ RR w 1 w 1 RR RR w 1 .
This follows directly from (5.25).
5.2.2 Min–max Problems

In this section, three different min–max problems with objective
function
tr φ(Z −1/2 HQH H Z −1/2 )
are considered. The constraint on the covariance matrix Q will always

be Q ≥ 0 and tr Q ≤ P . The constraint set of the covariance matrix
Z will vary. The results are stated in [18] without proof. Therefore, we
give the complete proofs.
In the analysis of vector channels, minimax problems are often stud-
ied of the following type
min max tr φ(Z −1/2 HQH H Z −1/2 ) (5.26)

Z∈Z Q∈Q
in which the noise Z is positive semidefinite and in some set of admis-

sible noise plus interference Z and the transmit strategy Q is positive
semidefinite and belongs to some set of admissible transmit strategies
Q. The matrix H with singular value decomposition H = U ΛV H is
fixed and depends on the transmission medium. If the AWGN chan-
nel is studied, H = I. Otherwise, the matrix H can be a flat-fading
MIMO channel matrix, or a frequency-selective channel matrix, etc.
In general, we assume that the transmit strategy is power limited,
i.e., Q = {Q : tr (Q) ≤ P }. However, the noise Z can be created by
a variety of effects, e.g., thermal noise, intercell, intracell, inter-symbol,
or inter-carrier-interference. Z has full rank. The concrete structure
of Z depends on the application, transmit- and receive strategies. In
[14, 145], expressions like (5.26) in which φ was the mutual informa-
tion were studied under different admissible sets Z and Q. For mutual
information [106] studies a general game-theoretic framework for
min–max optimization.
The results in this section were stated in [17] without proofs.
5.2.2.1 Min–max Problem with a Trace Constraint

In this section, the worst case matrix Z with a trace constraint is
characterized. The optimization problem is given by
ΦI = inf max tr F (Z −1/2 HQH H Z −1/2 ). (5.27)
tr (Z)≤σ 2 n tr (Q)≤P
The following Theorem 5.5 characterizes the value of the minimax prob-
lem in (5.27).
Theorem 5.5. The value of the minimax problem in (5.27) is given by

P H
ΦI = tr φ HH . (5.28)
σ2n
Remark 5.2. Note that the argument of the RHS in (5.28) is inde-
pendent of the function φ.
Proof. First, we prove that the minimax performance equals the mini-
max performance of the expression

−1/2 1/2 1/2 −1/2
ΦDI = inf max tr φ ΛZ ΛH Λ Λ
Q H ΛZ . (5.29)
tr ΛZ ≤nσ 2 tr ΛQ ≤P
1/2
The singular value decomposition of H is given by H = U H ΛH V H
H.
D
At first, we show that for ΦI ≤ ΦI , we have
max tr φ(Z −1/2 HQH H Z −1/2 )
tr (Q)≤P
1/2 1/2
= max tr φ(Z −1/2 U H ΛH V H H −1/2
H QV H ΛH U H Z )
tr (Q)≤P
1/2 1/2
= max tr φ(Z −1/2 U H ΛH QΛH U H
HZ
−1/2
)
H
tr (V H QV )≤P
1/2 1/2
= max tr φ(Z −1/2 U H ΛH QΛH U H
HZ
−1/2
).
tr (Q)≤P
Now, we choose Ẑ = U H ΛZ U H
H fixed, then it directly follows
ΦI = inf max tr φ(Z −1/2 HQH H Z −1/2 )

−1/2 1/2 1/2 −1/2
≤ inf max tr φ(ΛZ ΛH QΛH ΛZ )
= ΦD
I . (5.30)
Next, we use Theorem 3.16 to show that ΦI ≥ ΦD

I . With Theorem 3.16
we have
−1/2 1/2 1/2 −1/2
tr φ(ΛZ ΛH QΛH ΛZ )
m
X
≥ min tr φ(λ−1 H
πi (Z)λi (HQH )). (5.31)
π
i=1
The maximum over Q of the term in (5.31) is greater than or equal to

the termwith the choice of U Q = U H
H , i.e.,
−1/2 1/2 1/2 −1/2

max tr φ(ΛZ ΛH QΛH ΛZ )
tr (Q)≤P
m
X
≥ min tr φ(λ−1
πi (Z)λi (H)λi (Q̂)). (5.32)
π
i=1
Inequality (5.32) is valid for all Z. Therefore, we have

−1/2 1/2 1/2 −1/2
inf max tr φ(ΛZ ΛH QΛH ΛZ )
tr (Z)≤nσ 2 tr (Q)≤P
m
X
≥ inf max tr φ(λ−1
i (Z)λi (H)λi (Q)). (5.33)
tr (Z)≤nσ 2 tr (Q)≤P
i=1
From (5.33) it follows
ΦI ≥ ΦD
I . (5.34)
From (5.30) and (5.34), it follows ΦI = ΦD

I .
Next, the minimax problem

−1/2 H −1/2
inf max tr φ(Z HQH Z )
yields two different Lagrangian’s: One for the maximization with

respect to Q and one with respect to Z. We start with the Lagrangian
of the minimization problem
ν ν
! !
X λi (H)λi (Q) X
2
L(Λ̂Z , ξk , µ) = φ +µ λl (Z) − nσ
λi (Z)
i=1 l=1
ν
X
+ ξk λk (Z). (5.35)
k=1
The objective function φ is monotonic increasing. Therefore, the

Lagrangian multiplier ξk which ensures that the eigenvalues of the
matrix Z are greater than or equal to zero, are all equal to zero, because
λk (Z) > 0 for all 1 ≤ k ≤ ν. Since the optimization problem is convex
with respect to the eigenvalues of Z, we have the necessary and suffi-
cient Karush–Kuhn–Tucker (KKT) condition from (5.35)

λ (H)λ (Q)φ 0 λi (H)λi (Q)
∂L(Λ̂Z , µ) i i λi (Z)
=− 2
+ µ = 0. (5.36)
∂λi (Z) λi (Z)
We express (5.36) as
λ∗i (Z)2

0 λi (H)λi (Q)
φ = µ . (5.37)
λ∗i (Z) λi (H)λi (Q)
The Lagrangian multiplier µ has to be chosen that νi=1 λ∗i (Z) = nσ 2 .
P
The derivation for the maximization with respect to Q goes similar

lines.
In the second part of the proof, we further characterize the worst
P
case matrix Z eigenvalues. Denote ρ = nσ 2.
We denote the eigenvalues of the optimal matrix Q∗ with λi (Q∗ )

and the worst case matrix Z ∗ eigenvalues with λi (Z). The so called
“water-filling” solution of the matrix Q eigenvalues is given for all
λi (Q∗ ) > 0 as
∗
0 λk (H)λk (Q ) λk (Z)
φ = µ̃ (5.38)
λk (Z) λk (H)
with µ̃ ≥ 0. We show that the choice λi (Z ∗ ) = ρ1 λi (Q∗ ) fulfills both
optimality conditions (5.37) and (5.38) simultaneously. This result is
derived by computing the Lagrangian multiplier for (5.38) and (5.37)

by noting that
µ̃λk (Q∗ ) = µλk (Z ∗ ). (5.39)
Finally, the two Lagrangian multipliers are related from (5.39) by

nσ 2
µ̃ = µ.
P
As a result, the eigenvalues of the worst case matrix Z in (5.37)
correspond to the weighted eigenvalues of the optimal matrix Q
1
λ∗i (Z) = λ∗i (Q).
ρ
Finally, we set λk (Q∗ ) into the performance function and obtain
(5.28), which completes the proof.
Remark 5.3. If m > ν = rank(Q), we have to allocate to the eigen-

values ν + 1, . . . , m of Z. By → 0, the minimax value of the Theorem
is achieved. If m ≤ ν, we can replace the infimum by the minimum in
the minimization problem.
5.2.2.2 Min–max Problem with Eigenvalue Constraints

We assume that the eigenvalues of Z are fixed and ordered, i.e.,
λ1 (Z) ≥ λ2 (Z) ≥ · · · ≥ λn (Z). Here, we study the impact of the uni-
tary matrix U Z . We write the set of unitary n × n matrices as U(n).
Let us define the optimization problem as
−1/2 −1/2
ΦII = min max tr φ(ΛZ W H HQH H W ΛZ ). (5.40)
W ∈U (n) tr (Q)≤P
Using Theorem 3.16, one can easily prove the following corollary.
Corollary 5.2. The solution to the minimization problem in (5.40) is

given by a permutation matrix P , i.e.,
−1/2 −1/2
ΦII = tr φ(ΛZ P H ΛH ΛQ ΛH
H P ΛZ ).
5.2.2.3 Min–max Optimization with Diagonal Constraints

We define the value of the minimax problem as
ΦIII = min max tr F (Z −1/2 HQH H Z −1/2 ). (5.41)
Z∈Z3 tr (Q)≤P
The set Z3 denotes the set of positive semi-definite matrices which

have fixed diagonal entries σn2 . Furthermore, we define the following
programming problem
ΦD
III = P n
max
k=1 pk =P,pk ≥0
tr F (H H diag([p1 , . . . , pn ])H). (5.42)
The minimax problem in (5.41) was studied in [147, Sec. III.c] in

the context of uplink and downlink duality. Further on, in the tutorial
paper [91] the application of Lagrange duality theory for convex opti-
mization problems to show the uplink downlink duality and a similar
optimization problem as in (5.41) is analyzed.
Remember Lemma 3.17 and (3.19). In the context of wireless com-
munications, the result in (3.19) is called reciprocity of uplink and
downlink transmission. This result is needed in the next theorem.
Theorem 5.6. The value of (5.41) equals the value of (5.42), i.e.,
ΦIII = ΦD
III .
Proof. We start with ΦIII in (5.41). We have by the reciprocity result

from the last lemma
ΦIII = min max tr φ(Z −1/2 HQH H Z −1/2 )
Z∈Z3 tr (Q)≤P
= min max tr φ(H H Z −1/2 SZ −1/2 H)

Z∈Z3 tr (S)≤P
= min max tr φ(H H SH). (5.43)

Z∈Z3 tr (SZ)≤P
Next, we define the maximization problem

ΦIIIa (Z) = max tr φ(H H SH). (5.44)
tr (SZ)≤P
We obtain the upper bound for (5.43) by choosing a concrete Z in

(5.44). This leads to the upper bound on ΦIII . As a result, it holds
ΦD
III ≤ ΦIII ≤ ΦIIIa (Z). (5.45)
For the lower bound in (5.45), we choose S to be diagonal. Then the

minimization with respect to Z has no impact on the diagonal S,
because the diagonal entries of the Z are constrained to be less than
or equal to σ 2 .
Note that the objective function in (5.44) and (5.42) are equal,
namely tr φ(H H SH). Next, we study the optimality conditions for the
optimization problem in (5.44) and (5.42). The necessary and sufficient
condition for optimality of S ∗ in (5.44) is given by the KKT conditions
(6.5) in Appendix 6.2.3. The Lagrangian of the optimization problem
in (5.44) is given by
L(S, λ, Ψ) = tr φ(H H SH) − λ (tr (SZ) − P ) + tr ΨS
with Lagrangian multipliers λ ≥ 0 for the trace constraint and Ψ ≥ 0
for the positive-semi definiteness constraint. As a result, the KKT con-
ditions for the problem (5.44) are given by
tr SZ ≤ P
tr ΨS = 0
∂
L(S, λ) = Hφ0 (H H S ∗ H)H H − λZ + Ψ = 0. (5.46)

∂Q S=S ∗
The KKT conditions (see (6.5) in Appendix 6.2.3) for the optimization
in (5.42) are derived via the Lagrangian
n n
!
X X
H
L̄(p, ν, ψ) = φ(H diag(p)H) − ν pk − P + pk ψ k
k=1 k=1
as
tr diag(p) ≤ P
tr diag(p)diag(ψ) = 0
0 ∗
hi φ (H diag(p )H)hH
H
i − ν + ψi = 0. (5.47)
The row vector hi is the ith row vector of H. Observe that the first
KKT condition in (5.47) is also fulfilled for all Z with fixed diagonal
entries. The KKT conditions in (5.47) correspond to the KKT condi-
tions in (5.46). This means that the value of ΦDIII is equal to the value
of ΦIIIa (Z) for
1 1
Z ∗ = Hφ0 (H H S ∗ H)H H + Ψ. (5.48)
λ λ
Note that the matrix Z ∗ has full rank regardless of the rank of S.
Furthermore, note that the matrix Z and the Lagrangian multiplier Ψ
to ensure positive semi-definiteness of S fulfill
tr S ∗ Ψ = 0
tr Z ∗ S ∗ = P.
The Lagrangian multiplier λ is derived from (5.42) as
λ = hk φ0 (H H diag(p)H)hH
k (5.49)
for all k for which pk > 0. The columns k of the channel H that corre-
spond to pk = 0 can be omitted. For the minimax problem in ΦIII
this means that the effective channel can be reduced by canceling
all those columns k of H. Therefore, the rows of Z −1/2 that corre-
spond to these k do not influence the value of ΦIII . Especially the
kth eigenvalue of Z can be chosen arbitrarily according to the pos-
itiveness constraint and to the diagonal constraint. As a result, the
kth diagonal entry of Z can be smaller than σ 2 without increasing
the value of the minimax problem ΦIII . Finally, these kth diagonal
entries of Z can be “filled up” to σ 2 without decreasing the value
of ΦIII .
5.2.3 MSE Based Performance Measures

Let us start with the weighted sum of MSEs for single-user MIMO
systems [117]
n
X h i−1 h i−1
1/2 H 1/2 1/2 H 1/2
λk I + Q H HQ = tr Λ I + Q H HQ
kk
k=1
with diagonal matrix Λ = diag(λ1 , . . . , λn ). Next, consider a matrix-

convex performance function φ that is based on the individual MSEs
and a scalar mapping f , e.g., f (x) = − log(x) for mutual informa-
tion where φ(x) = f (1/x). Then the following inequality holds for the
weighted sum performance

n n
X X −1
λk f I + Q1/2 H H HQ1/2 kk

λk f (MSEk ) =
k=1 k=1
n
X
λk φ I + Q1/2 H H HQ1/2 kk

≤
k=1
= tr Λφ I + Q1/2 H H HQ1/2

= J(λ, Q). (5.50)

The inequality in (5.50) follows from Jensens inequality. Note that the
inequality is fulfilled with equality if the MSE matrix is a diagonal
matrix. There are many cases in which the MSE matrix is actually
diagonal, e.g., single-user MIMO with perfect CSI [105], single-user
(and also multi-user) MIMO with covariance or mean feedback [167],
and single-user MIMO with equal power allocation and isotropically
distributed channel [78]. A counter example is the multiuser MIMO
case with perfect CSI where the MSE matrix is not diagonal. However,
in such cases, the function I(λ, Q) provides an upper bound to the
performance.
Example 5.1. This example is an easy extension to the case consid-

ered in [117] and serves merely for illustration. Consider the single-user
MIMO case with perfect CSI at transmitter and receiver. The optimal
transmit strategy is to send in direction of the right eigenvectors of the
channel matrix, the resulting power allocation problem reads
n
P p ≤P
X
J(λ) = max λk φ (γk pk )
pk ≥0: k
k=1
H
with eigenvalues γ1 , . . . , γn of H H. The optimal power allocation is
easily found as
+
∗ 0 µ 1
pk = φ̃ (5.51)
λk γk γk
with φ̃0 as the inverse of the first derivative of φ and µ such that
Pn ∗
k=1 pk = P . Also compare (5.51) to [90].
The optimal power allocation specializes directly to p∗k = λµk −

q
1 + ∗ = λk 1 +

γk for mutual information and p k µγk − γk (see [117, (21)]).
The resulting function J(λ) is thus given by
n
X µ
J(λ) = λk ψ
λk γk
k=1
with concatenated function ψ(x) = φ(φ̃0 (x)).
If the inequality in (5.50) is not tight or should not be used, it is also

possible to define a class of optimization problems based on the diagonal
elements of the MSE matrix as f0 (d(M )) with diagonal operator d
h i−1
and MSE matrix M = I + Q1/2 H H HQ1/2 . The following class
of MSE based optimization problems has been studied in [105, Thm. 1]
−1
min f0 d I + Q1/2 H H HQ1/2

s.t. tr Q ≤ P (5.52)
Q0
The solution to (5.52) was completely characterized for Schur-convex

and Schur-concave functions f0 . For these two classes the optimal Q
diagonalizes the H H H as in Theorem 3.16.
5.2.4 Sum of Quadratic Hermitian Forms

From the discussion in Section 5.1.4 the sum performance in (5.18)
of the MIMO MAC and BC was obtained: The sum capacity of the
MIMO MAC with SIC and MIMO BC with Costa Precoding as well as
the average normalized sum MSE for the MIMO MAC can be written
in the following generalized representation using an arbitrary matrix-
monotone performance function φ(X)
K
!
X
Φ(ρ, Q, H) = tr φ ρ H k Qk H H
k . (5.53)
k=1
5.2.4.1 Individual Power Constraints

In order to maximize the sum performance in (5.53) for fixed and known
channel realizations H k , find the optimal transmit covariance matrices
Q∗1 , . . . , Q∗K , i.e., solve

K
!
X
max tr φ ρ H k Qk H H
k
Q1 ,...,QK
k=1
subject to tr Qk ≤ pk and Qk 0, 1 ≤ k ≤ K. (5.54)
The next result shows that the optimal covariance matrices can
be found by iterative single-user performance optimization with col-
ored noise. This approach corresponds with the iterative waterfilling
approach in [164] for sum capacity optimization in which single-user
waterfilling is iteratively performed treating the other users as noise.
On the one hand this approach provides insight into the structure of
the optimum transmit covariance matrices and on the other hand under
specific conditions this approach is computational more efficient than
the joint optimization of the transmit covariance matrices. If the num-
ber of users is large in comparison to the number of transmit antennas
of the users, the joint optimization is computational more complex than
the iterative optimization of each user separately.
Theorem 5.7. The optimal transmit covariance matrices solve the

optimization problem in (5.54) if and only if they fulfill the following
set of conditions for all 1 ≤ i ≤ K
K
!
H 0
X
H
ρH i φ ρ H k Qk H i H i = νi I − Ψi ,
k=1
tr (Ψi Qi ) = 0, Ψi 0, Qi 0, νi ≥ 0,
pi − tr (Qi ) ≥ 0, νi (pi − tr (Qi )) = 0. (5.55)
The proof is based on the optimality conditions in (6.5) and can be

found in [20]. The following interpretation of the optimality result in
Theorem 5.7 leads to an elegant solution of the optimization problem.
Based on the optimality conditions in Theorem 5.7, it can be shown
that a certain kind of iterative single-user optimization solves (5.54).
For the kth user, we fix the transmit covariance matrices of all other
users and write the normalized noise plus interference as

K
X
Zk = I + ρ H l Ql H H
l . (5.56)
l=1
l6=k
Next, the following optimization is performed iteratively for all users

1, . . . , K
max tr φ Z k + ρH k Qk H H

k
subject to tr Qk ≤ pk and Qk 0, 1 ≤ k ≤ K. (5.57)
5.2.4.2 Power Allocation Under Sum Power Constraint

The channel realizations H k of all users k are assumed to be known.
Keep the transmit covariance matrices fixed Q01 , Q02 , . . . , Q0K . Distribute
a fixed amount of transmit power P across the mobiles, i.e., solve
K
!
X
0 H
max tr φ ρ pk H k Qk H k
p1 ,...,pK
k=1
XK
subject to pk ≤ P and pk > 0, 1 ≤ k ≤ K. (5.58)
k=1
Theorem 5.8. The optimal power allocation p1 , . . . , pK solves the opti-

mization problem in (5.58) if and only if it fulfills for all 1 ≤ i ≤ K
" K #!
H 0
X
0 0 H
tr ρH i Qi H i φ ρ pk H k Qk H k = µ − λi ,
k=1
λk pk = 0, λk ≥ 0, µ ≥ 0, pk ≥ 0,
K K
!
X X
P − pk ≥ 0, µ P − pk = 0 (5.59)
k=1 k=1
The proof is based again on the optimality conditions in (6.5) and can
be found in [20].
These conditions in (5.59) are fulfilled by the optimum power
allocation vector for fixed covariance matrices. Observe, that for all
active users l the Lagrangian multiplier λl is equal to zero. There-

0 PK
fore, the condition tr ρH i Q0i H H 0 H

i φ ρ k=1 pk H k Qk H k = µ is ful-
filled. This directly leads to the following characterization of the
SNR range in which single-user power allocation (or TDMA) is
optimal.
Theorem 5.9. Let the users be ordered according to their maximum

channel eigenvalues in decreasing order. Let Qs1 be the optimal transmit
covariance matrix for the single-user channel H 1 according to general-
ized single-user water filling from Theorem 5.4. The optimal transmit
strategy is described by the following points:
(1) For fixed ρ̂, it is optimal to allocate the complete sum power
to user one, i.e., the user with the largest maximum channel
eigenvalue, if and only if the following condition is satisfied
0
λmax H H s H

2 φ ρ̂H 1 Q1 (ρ̂)H 1 H 2
0
≤ λmax H H s H

1 φ ρ̂H 1 Q1 (ρ̂)H 1 H 1 (5.60)
then the optimal set of covariance matrices is Q1 (ρ̂) = Qs1 (ρ̂)

and Q2 (ρ̂) = Q3 (ρ̂) = · · · = QK (ρ̂) = 0.
(2) If (5.60) is fulfilled for ρ̂ then (5.60) holds for all ρ ≤ ρ̂, too.
Therefore, the single-user optimality range is exactly charac-
terized by choosing the largest ρ̂ for which (5.60) holds.
(3) If λmax (H 1 H H H
1 ) > λmax (H j H j ) for all 2 ≤ j ≤ K then user
one is supported first.
The proof can be found in [20].
5.2.4.3 Sum Performance Optimization Under Sum

Power Constraints
Combining these two steps — power allocation and transmit covariance
matrix optimization — then one arrives at the general problem of sum
performance optimization under a sum power constraint

K
!
X
H
max tr φ ρ H k Q̃k H k
Q̄1 ,...,Q̄K
k=1
K
X
subject to tr Q̃k ≤ P and Q̃k 0, 1 ≤ k ≤ K. (5.61)
k=1
Theorem 5.10. The optimal transmit covariance matrices solve (5.61)

if and only if they fulfill the following set of conditions for all 1 ≤ i ≤ K
" K #
0
X
ρH H
i φ ρ H k Q̃k H H
k H i = µI − Ψi ,
k=1
tr (Q̃i Ψi ) = 0, Ψi 0, Q̃i 0, µ ≥ 0,
K K
!
X X
P − tr (Q̃k ) ≥ 0, µ P − tr (Q̃k ) = 0. (5.62)
k=1 k=1
The proof is based on the optimality conditions (6.5) and can be found
in [20]. Furthermore, in [20], an algorithm is developed which is based
on alternating optimization, the optimality in the fixed point and the
convergence is proved.
5.2.4.4 Illustration of Single-user Optimality Range

For comparison purposes, we consider the same two-user MIMO MAC
as in the example 1 in [147], i.e., two transmit antennas per user and
two receive antennas at the base station with the following real-valued
channel matrices for user one and two, respectively

1 0.5 0.2 2
H1 = , H2 = . (5.63)
0.8 2 1 0.5
The channel matrix one has the maximum eigenvalue. This means that
user one is the best user and is supported first. In Figure 5.3(a), we
show the single-user power level and the maximum eigenvalue of the
effective channel which user two experiences. The noise variance is fixed
σn2 = 1.
(a)
(b)
Fig. 5.3 Sum capacity optimization for MIMO MAC: Optimal power allocation and compar-
ison with suboptimal transmit strategy. (a) Single-user power level for user 1 and maximum
eigenvalue of channel matrix for user 2 as a function of the maximum transmit sum power.
(b) Sum capacity comparison between optimal transmit strategy compared to single-user
only strategy for multiple antenna MAC with two users and the channels in (5.63).
In the example for the illustration in Figure 5.3(a), the single-user

range is up to an SNR of about −12 dB. From this point on, the optimal
transmit strategy supports more than one user. Hence, the convenient
scheme that supports only one user at a time (or time-division multiple-
access TDMA) is not optimal for higher SNR values. In [80], it was
shown that in single-antenna MAC the optimal transmit strategy is to
support only one user at a time. That lead to the notion of multiuser
diversity and the development of scheduling algorithms based on that
information theoretic result. In multiple antenna systems, this single-
user optimality is not valid [16]. The necessary and sufficient optimality
condition can be found in (5.60).
6
Appendix
6.1 Linear Algebra

Most of the material can be found in [50, 51]. Since this section con-
tributes basic results, the proofs are omitted.
Proposition 6.1 (Singular value decomposition). Every square

matrix A of dimension n can be decomposed into
A = V ΣW H ,
where V , W are unitary square matrices of size n × n and Σ is a diag-
onal matrix with nonnegative main diagonal entries and the rank of Σ
is the same as the rank of A.
Proposition 6.2 (Eigenvalue decomposition). Let A be a square

matrix of dimension n. Then A is Hermitian if and only if there is a
square unitary matrix U of size n and a real diagonal matrix Λ of size
n × n such that
A = U ΛU H .
133
134 Appendix
Lemma 6.3 (Matrix inversion lemma). Suppose A, B are square

and invertable matrices. It holds
−1
A + XBX H
−1 H −1
= A−1 − A−1 X B −1 + X H A−1 X

X A . (6.1)
6.2 Convex Optimization

The book [24] provides a very good overview over convex optimization.
However, we review certain results that are useful for immediate appli-
cation in the current work. The material is mainly from [24, Ch. 5].
6.2.1 Lagrange Dual Function

Consider the following optimization problem given in standard form:
p∗ = min f0 (x) s.t. fi (x) ≤ 0 for i = 1, . . . , m

and hi (x) = 0 for i = 1, . . . , p (6.2)
for x ∈ X .
Definition 6.1 (Lagrangian). The Lagrangian L : X × Rm × Rp →

R for problem (6.2) is
m
X p
X
L(x, λ, ν) = f0 (x) + λk fk (x) + νk hk (x).
k=1 k=1
The Lagrangian multiplier λi is associated with the ith inequality

constraint, whereas νi is associated with the ith equality constraint.
The vectors λ and ν are called the dual variables or Lagrange multiplier
vectors for the problem (6.2).
The Lagrange dual function g : Rm × Rp → R is the minimum value
of the Lagrangian over x
g(λ, ν) = inf L(x, λ, ν).

x∈X
Remark 6.1. Since the dual function is the pointwise infimum of a

family of affine functions of (λ, ν), it is concave, even if the problem in
(6.2) is not convex.
Lemma 6.4 (Section 5.1.3 [24]). The dual function yields lower
bounds on the optimal value of the problem (6.2). For any λ 0 and
any ν we have
g(λ, ν) ≤ p∗ .
Proof. Suppose x̄ is a feasible point for the problem (6.2), i.e., fi (x̄) ≤ 0
and h(x̄) = 0, and λ 0. Obviously
m
X p
X
λk fk (x̄) + νk hk (x̄) ≤ 0,
k=1 k=1
since the terms in the first sum are nonpositive and terms in the second
sum are all zero. As a result,
g(λ, ν) = inf L(x, λ, ν) ≤ L(x̄, λ, ν) ≤ f0 (x̄).
x∈X
This lower bound holds for all feasible points x̄ ∈ X .
Remark 6.2. In the Lagrangian function, a constraint violation is

weighted linearly by its corresponding multiplier. Since the constraints
are to be fulfilled in any way, the first approach would be to define
the function outside the feasible range as ∞. This hard punishment
of constraint violation is replaced by the linear one in the Lagrangian
multiplier L.
Definition 6.2 (Langrange dual problem). A natural question

regarding the Lagrange dual function is about the best lower bound
that can be obtained from the Lagrange dual function
d∗ = max g(λ, ν) s.t. λ 0. (6.3)
136 Appendix
This is called the Langrange dual problem associated with the

problem (6.2).
Remark 6.3. The Lagrange dual problem (6.3) is always a convex

optimization problem, since the objective to be maximized is concave
and the constraint is convex. This is independent of whether the primal
problem in (6.2) is convex or not. Furthermore, the optimal value of
the dual problem is always smaller than or equal to the value of the
primal problem
d ∗ ≤ p∗ .
The difference between these two p∗ − d∗ is the optimal duality gap is
always nonnegative.
6.2.2 Strong Duality
Definition 6.3 (Strong duality). Strong duality holds if the duality

gap p∗ − d∗ = 0. This means that the best bound that can be obtained
from the Lagrange dual function is tight.
Remark 6.4. Strong duality does not, in general, hold. Even convexity
of the primal problem is not sufficient. The conditions under which
strong duality holds are called constraint qualifications. One simple
constraint qualification is Slater’s condition: There is an x ∈ X s.t.
fi (x) < 0 for all i = 1, . . . , m and hi (x) = 0 for all i = 1, . . . , p. Slater’s
theorem says that strong duality holds, if Slater’s condition holds (and
the problem is convex).
Consider in the following the optimization problem with only

inequality constraints
max f0 (x) s.t. fi (x) ≤ 0 for i = 1, . . . , m.
Denote the Lagrangian multiplier for the inequality constraints by λk .
The connection to max–min problems is explained by the following
representations: The weak duality can be expressed as the inequality
sup inf L(x, λ) ≤ inf sup L(x, λ)

λ0 x x λ0
and strong duality as the equality
sup inf L(x, λ) = inf sup L(x, λ).

λ0 x x λ0
This can be generalized to arbitrary functions f (x, y). In order to decide

whether the min–max expressions satisfy the saddle-point property
min max f (x, y) = max min f (x, y) (6.4)

x∈X y∈Y y∈Y x∈X
we use Theorem 1 in [35]. One result in [35, Thm. 1] states, that (6.4)
is fulfilled, if X and Y are two compact Hausdorff spaces, f is for
every y ∈ Y lower semi-continuous, f is for every x ∈ X upper semi-
continuous, and f is convex on x and concave on y and if the sets X
and Y are convex, too. The following saddle-point interpretation can
also be applied
f (x∗ , y) ≤ f (x∗ , y ∗ ) ≤ f (x, y ∗ ),
for all x ∈ X and y ∈ Y . In other words, x∗ minimizes f (x, y ∗ ) and y ∗

maximizes f (x∗ , y)
f (x∗ , y ∗ ) = inf f (x, y ∗ ) and f (x∗ , y ∗ ) = sup f (x∗ , y).

x∈X y∈Y
This also implies that the strong max–min property holds (and there-
fore the strong duality).
Example 6.1. Consider the following convex programming problem
min x2 s.t. x ≤ −3.
In standard form (6.2) we have f0 (x) = x2 and f1 (x) = x + 3. The

Lagrangian is given by
L(x, λ) = x2 + λ(x + 3).

138 Appendix
The dual function is given by

1
g(λ) = − λ2 + 3λ.
4
The maximum of the dual function is found to be at λ∗ = 6 and yields
the value of the dual problem d∗ = 9. The solution of the primal prob-
lem is then x∗ = − 12 λ = −3 and the value of the primal problem is
p∗ = 9. Since the problem is convex, strong duality holds, i.e., p∗ = d∗ .
6.2.3 Optimality Conditions

Assume that strong duality holds. Let x∗ be a primal optimal and
(λ∗ , ν ∗ ) be a dual optimal point. This means that
m p
!
X X
∗ ∗ ∗ ∗ ∗
f0 (x ) = g(λ , ν ) = inf f0 (x) + λi fi (x) + νi hi (x)
x
k=1 k=1
m
X p
X
≤ f0 (x∗ ) + λ∗i fi (x) + νi∗ hi (x) ≤ f0 (x∗ ).
k=1 k=1
The two inequalities in this chain hold with equality. Therefore, it fol-
lows that the minimizer of the Lagrangian with respect to x is x∗ . In
addition to this, m ∗ ∗
P
k=1 λi fi (x ) = 0. It follows the so called complemen-
tary slackness condition
λ∗i fi (x∗ ) = 0 for all i = 1, . . . , m.
This means that the ith optimal Lagrange multiplier is zero unless
the ith constraint is active at the optimum. Finally, we arrive at the
Karush–Kuhn–Tucker optimality conditions (KKT conditions).
Definition 6.4 (KKT conditions). The KKT conditions for the

optimization problem in (6.2) are given by
m
X p
X
∇f0 (x∗ ) + λ∗i ∇fi (x∗ ) + νi∗ ∇hi (x∗ ) = 0,
i=1 i=1
fi (x∗ ) ≤ 0, i = 1, . . . , m
hi (x∗ ) = 0, i = 1, . . . , p
λ∗i 0, i = 1, . . . , m
λ∗i fi (x∗ ) = 0, i = 1, . . . , m. (6.5)
Lemma 6.5 (KKT optimality conditions). The KKT conditions

are necessary optimality conditions for any nonlinear optimization
problem under the assumption that the constraints are regular (see,
e.g., [114]). If the primal problem is convex, the KKT conditions are also
sufficient. If a convex optimization problem with differentiable objec-
tive and constraint functions satisfies Slater’s condition, then the KKT
conditions provide necessary and sufficient conditions for optimality.
7
Acknowledgments
Part of the content of this book was presented during lectures at the
Technische Universität Berlin, Germany, within the course “Applied
Information Theory” from 2005–2007, at the Royal Institute of Tech
nology, Stockholm, Sweden, within the course “Advanced Digital Com-
munications,” and at the Beihang University, Beijing, China within the
course “Advanced Digital Communications” in 2007.
This work has been supported in part by the Bundesministerium
für Bildung und Forschung (BMBF) under Grant BU150 and in part
by the Swedish Research Foundation (Vetenskapsrådet) under Grant
623-2005-5359.
141
References
[1] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. Dover

Publications, 1970.
[2] N. Ahmed and R. G. Baraniuk, “Throughput measures for delay-constrained
communications in fading channels,” Allerton Conference on Communication,
Control and Computing, October 2003.
[3] S. M. Alamouti, “A simple transmit diversity technique for wireless communi-
cations,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8,
pp. 1451–1458, October 1998.
[4] T. Ando, “On some operator inequalities,” Mathematique Annalen, vol. 279,
pp. 157–159, 1987.
[5] T. Ando and X. Zhan, “Norm inequalities related to operator monotone func-
tions,” Mathematique Annalen, vol. 315, pp. 771–780, 1999.
[6] G. Bauch and J. Hagenauer, “Smart versus dumb antennas-capacities and
FEC performance,” IEEE Communication Letters, vol. 6, no. 2, pp. 55–57,
February 2002.
[7] I. Bettesh and S. Shamai, “Optimal power and rate control for minimal aver-
age delay: The single-user case,” IEEE Transactions on Information Theory,
vol. 52, no. 9, pp. 4115–4141, September 2006.
[8] R. Bhatia, Matrix Analysis. Springer-Verlag, 1997.
[9] E. Biglieri, G. Caire, and G. Taricco, “Limiting performance of block-fading
channels with multiple antennas,” IEEE Transactions on Information Theory,
vol. 47, no. 4, pp. 1273–1289, May 2001.
[10] E. Biglieri, J. Proakis, and S. Shamai (Shitz), “Fading channels: Information-
theoretic and communications aspects,” IEEE Transactions on Information
Theory, vol. 44, no. 6, pp. 2619–2692, October 1998.
143
144 References
[11] I. Bjelakovic and H. Boche, “Structure of optimal input covariance matrices

for mimo systems with covariance feedback under general correlated fading,”
Proceedings of IEEE International Symposium on Information Theory (ISIT
2006), July 2006.
[12] H. Boche, “Advanced network calculus for interference functions,” in Plenary
Talk at IEEE SPAWC, 2007.
[13] H. Boche and E. A. Jorswieck, “Sum capacity optimization of the MIMO
Gaussian MAC,” Proceedings of 5th International Symposium on Wireless
Personal Multimedia Communications, invited paper, vol. 1, pp. 130–134,
October 2002.
[14] H. Boche and E. A. Jorswieck, “Multiuser MIMO systems: Worst case
noise and transmitter cooperation,” Proceedings of ISSSPIT, invited paper,
2003.
[15] H. Boche and E. A. Jorswieck, “Uplink sumrate maximization with different
types of channel state information at the transmitters,” Proceedings of IEEE
ISSPIT, 2003.
[16] H. Boche and E. A. Jorswieck, “Multiple antenna multiple user channels:
Optimisation in low SNR,” Proceedings of IEEE Wireless Communications
and Networking Conference, 2004.
[17] H. Boche and E. A. Jorswieck, “Optimization of matrix monotone func-
tions: Saddle-point, worst case noise analysis, and applications,” Proceedings
of IEEE ISIT, 2004.
[18] H. Boche and E. A. Jorswieck, “Optimization of matrix monotone func-
tions: Saddle-point, worst case noise analysis, and applications,” Proceedings
of IEEE ISIT, 2004.
[19] H. Boche and E. A. Jorswieck, “Outage probability of multiple antenna sys-
tems: Optimal transmission and impact of correlation,” Proceedings of IEEE
IZS, pp. 116–119, 2004.
[20] H. Boche and E. A. Jorswieck, “On the performance optimization in multiuser
MIMO systems,” European Transactions on Telecommunications, vol. 18,
no. 3, pp. 217–233, April 2007.
[21] H. Boche and M. Schubert, “A new approach to power adjustment for spa-
tial covariance based downlink beamforming,” Proceedings of IEEE ICASSP,
vol. 5, pp. 2957–2960, May 2001.
[22] H. Boche and M. Schubert, “A general theory for uplink and downlink beam-
forming,” Proceedings of IEEE 56th Vehicular Technology Conference, vol. 1,
pp. 87–91, September 2002.
[23] H. Bölcskei and A. J. Paulraj, “Performance of space-time codes in the pres-
ence of spatial fading correlation,” in Proceedings of Asilomar Conference,
2000.
[24] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University
Press, 2004.
[25] J. Brinkhuis, Z. Q. Luo, and S. Zhang, “Matrix convex functions with appli-
cations to weighted centers for semidefinite programming,” Technical Report
SEEM2005-06, 2005.
[26] G. Caire and S. Shamai (Shitz), “On the multiple-antenna broadcast channel,”
35th Asilomar Conference on Signals, Systems and Computers, 2001.
References 145
[27] G. Caire, G. Taricco, and E. Biglieri, “Optimum power control over fading
channels,” IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1468–
1489, July 1999.
[28] D. Chizhik, G. J. Foschini, M. J. Gans, and R. A. Valenzuela, “Keyholes,
correlations, and capacities of multielement transmit and receive antennas,”
IEEE Transactions on Wireless Communications, vol. 1, no. 2, pp. 361–368,
April 2002.
[29] D. Chizhik, G. J. Foschini, and R. A. Valenzuela, “Capacities of multielement
transmit and receive antennas,” IEE Electronics Letters, vol. 36, pp. 1099–
1100, June 2000.
[30] C.-N. Chuah, D. N. C. Tse, and J. M. Kahn, “Capacity scaling in MIMO
wireless systems under correlated fading,” IEEE Transactions on Information
Theory, vol. 48, no. 3, pp. 637–650, March 2002.
[31] T. M. Cover, “Broadcast channels,” IEEE Transactions on Information The-
ory, vol. 18, no. 1, pp. 2–14, January 1972.
[32] T. M. Cover, “Comments on broadcast channels,” IEEE Transactions on
Information Theory, vol. 44, no. 6, pp. 2524–2530, October 1998.
[33] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley
& Sons, 1991.
[34] W. F. Donoghue Jr, Monotone Matrix Functions and Analytic Continuation.
Springer-Verlag, 1974.
[35] K. Fan, “Minimax theorems,” Proceedings National Academic Society, vol. 39,
pp. 42–47, 1953.
[36] M. Fiedler, “Bounds for the determinant of the sum of hermitian matrices,”
Proceedings of the American Mathematical Society, vol. 30, no. 1, pp. 27–31,
September 1971.
[37] G. J. Foschini and M. J. Gans, “On limits of wireless communications in a
fading environment when using multiple antennas,” Wireless Personal Com-
munications, vol. 6, pp. 311–335, 1998.
[38] D. Gerlach and A. Paulraj, “Adaptive transmitting antenna methods for mul-
tipath environments,” Global Telecommunications Conference, vol. 1, pp. 425–
429, December 1994.
[39] A. Goel and A. Meyerson, “Simultaneous optimization via approximate
majorization for concave profits or convex costs,” Algorithmica, vol. 44,
pp. 301–323, 2006.
[40] A. J. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, “Capacity limits
of MIMO channels,” IEEE Jourunal on Selected Areas in Communications,
vol. 21, no. 5, pp. 684–702, June 2003.
[41] J.-C. Guey, M. P. Fitz, M. R. Bell, and W.-Y. Kuo, “Signal design for
transmitter diversity wireless communication systems over rayleigh fad-
ing channels,” in 1996 IEEE Vehicular Techology Conference, pp. 136–140,
Atlanta, GA, 1996.
[42] D. Guo, Gaussian Channels: Information, Estimation and Multiuser Detec-
tion. PhD thesis, Princeton University, 2004.
[43] D. Guo, S. Shamai (Shitz), and S. Verdú, “Mutual information and minimum
mean-square error in Gaussian channels,” IEEE Transactions on Information
Theory, vol. 51, no. 4, pp. 1261–1282, April 2005.
146 References
[44] S. Hanly and D. Tse, “Multiaccess fading channels: Part II: Delay-limited
capacities,” IEEE Transactions on Information Theory, vol. 44, no. 7,
pp. 2816–2831, November 1998.
[45] F. Hansen, G. Ji, and J. Tomiyama, “Gaps between classes of matrix monotone
functions,” Bulletin London Mathematical Soceity, vol. 36, no. 1, pp. 53–58,
2004.
[46] F. Hansen and G. K. Pedersen, “Jensen’s inequality for operators and Löwner’s
theorem,” Mathematique Annalen, vol. 258, pp. 229–241, 1982.
[47] F. Hansen and G. K. Pedersen, “Perturbation formulas for traces on c*-
algebras,” Publication of the Research Institute for Mathematical Sciences,
Kyoto University, vol. 31, pp. 169–178, 1995.
[48] G. Hardy, J. E. Littlewood, and G. Pólya, Inequalities. Cambridge Mathemat-
ical Library, Second ed., 1952.
[49] T. Haustein and H. Boche, “On optimal power allocation and bit-loading
strategies for the MIMO transmitter with channel knowledge,” Proceedings of
IEEE ICASSP 2003, vol. IV, pp. 405–409, 2003.
[50] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press,
1985.
[51] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge Uni-
versity Press, 1991.
[52] M. Horodecki, P. Horodecki, and R. Horodecki, “Separability of mixed states:
Necessary and sufficient conditions,” Physics Letters A, vol. 223, pp. 1–8, 1996.
[53] M. T. Ivrlac and J. A. Nossek, “Quantifying diversity and correlation in
rayleigh fading mimo communication systems,” Proceedings of IEEE ISSPIT,
vol. 1, pp. 158–161, 2003.
[54] S. Jafar and A. Goldsmith, “Multiple antenna capacity in correlated rayleigh
fading with channel covariance information,” IEEE Transactions on Wireless
Communication, vol. 4, no. 3, pp. 990–997, May 2005.
[55] S. A. Jafar and A. Goldsmith, “On optimality of beamforming for multi-
ple antenna systems with imperfect feedback,” International Symposium on
Information Theory, p. 321, 2001.
[56] S. A. Jafar and A. Goldsmith, “Vector MAC capacity region with covari-
ance feedback,” IEEE International Symposium on Information Theory, p. 54,
2001.
[57] G. Jöngren, M. Skoglund, and B. Ottersten, “Combining beamforming and
orthogonal space-time block coding,” IEEE Transactions on Information The-
ory, vol. 48, no. 3, pp. 611–627, March 2002.
[58] E. Jorswieck, B. Ottersten, A. Sezgin, and A. Paulraj, “Guaranteed perfor-
mance region in fading orthogonal space-time coded broadcast channels,” Pro-
ceedings of IEEE ISIT, 2007.
[59] E. A. Jorswieck, Unified approach for optimisation of single-user and
multi-user multiple-input multiple-output wireless systems. PhD thesis,
Technical University of Berlin, Germany, September 2004. Available online:
http://edocs.tu-berlin.de/diss/2004/jorswieck eduard.htm.
[60] E. A. Jorswieck, Transmission strategies for the MIMO MAC, ch. 21, pp. 423–
442, Hindawi Publishing Corporation, 2005.
References 147
[61] E. A. Jorswieck, “Uplink throughput maximization with combined sum and

individual power constraints,” IEEE Communications Letters, vol. 10, no. 12,
pp. 816–818, December 2007.
[62] E. A. Jorswieck, M. Bengtsson, and B. Ottersten, “On the interplay between
scheduling, user distribution, csi, and performance measures in cellular down-
link,” in EUSPICO, invited paper, 2006.
[63] E. A. Jorswieck and H. Boche, “On transmit diversity with imperfect channel
state information,” Proceedings of IEEE ICASSP, vol. 3, pp. III-2181–III-2184,
May 2002.
[64] E. A. Jorswieck and H. Boche, “Behavior of outage probability in MISO sys-
tems with no channel state information at the transmitter,” Proceedings of
IEEE Information Theory Workshop, pp. 353–356, 2003.
[65] E. A. Jorswieck and H. Boche, “On the optimal transmission strategy for
the MIMO MAC with MMSE receiver,” Proceedings of IEEE International
Conference on Acoustics, Speech, and Signal Processing, vol. 5, pp. 109–112,
April 2003.
[66] E. A. Jorswieck and H. Boche, “Optimal transmission with imperfect channel
state information at the transmit antenna array,” Wireless Personal Commu-
nications, vol. 27, no. 1, pp. 33–56, 2003.
[67] E. A. Jorswieck and H. Boche, “Transmission strategies for the MIMO MAC
with MMSE receiver: Average MSE optimization and achievable individ-
ual MSE region,” IEEE Transactions on Signal Processing, vol. 51, no. 11,
pp. 2872–2881, November 2003.
[68] E. A. Jorswieck and H. Boche, “Optimal transmission strategies and impact
of correlation in multi-antenna systems with different types of channel state
information,” IEEE Transactions on Signal Processing, vol. 52, no. 12,
pp. 3440–3453, December 2004.
[69] E. A. Jorswieck and H. Boche, “Delay-limited capacity of parallel fading chan-
nels,” Proceedings of IEEE SPAWC, 2005.
[70] E. A. Jorswieck and H. Boche, “Multiple-antenna capacity in the low-power
regime: Channel knowledge and correlation,” Proceedings of IEEE ICASSP,
2005.
[71] E. A. Jorswieck and H. Boche, “Performance analysis of MIMO systems
in spatially correlated fading using matrix-monotone functions,” IEICE
Transactions on Fundamentals, vol. E98-A, no. 5, pp. 1454–1472, May
2006.
[72] E. A. Jorswieck and H. Boche, “Throughput analysis of cellular downlink
with different types of channel state information,” in Proceedings of IEEE
ICC, 2006.
[73] E. A. Jorswieck and H. Boche, “Outage probability in multiple antenna sys-
tems,” European Transactions on Telecommunications, vol. 18, no. 3, pp. 217–
233, April 2007.
[74] E. A. Jorswieck, H. Boche, and A. Sezgin, “Delay-limited capacity and maxi-
mum throughput of spatially correlated multiple antenna systems under aver-
age and peak-power constraints,” Proceedings of IEEE Information Theory
Workshop, 2004.
148 References
[75] E. A. Jorswieck, T. Oechtering, and H. Boche, “Performance analysis of com-

bining techniques with correlated diversity,” Proceedings of IEEE WCNC,
vol. 2, pp. 849–854, March 2005.
[76] E. A. Jorswieck and A. Sezgin, “Impact of spatial correlation on the perfor-
mance of orthogonal space-time block codes,” IEEE Communications Letters,
vol. 8, no. 1, pp. 21–23, January 2004.
[77] E. A. Jorswieck, A. Sezgin, and H. Boche, “Outage probability of OSTBC:
Optimal transmit strategy and suboptimality of odd number of transmit
antennas,” Proceedings of IEEE ICASSP, vol. 4, pp. IV-177–IV-180, 2006.
[78] A. Jovicic, P. Viswanath, and S. R. Kulkarni, “Upper bounds to transport
capacity of wireless networks,” IEEE Transactions on Information Theory,
vol. 50, no. 11, pp. 2555–2565, November 2004.
[79] M. Kiessling, Statistical Analysis and Transmit Processing for MIMO Wire-
less Systems in Correlated Fading Environments. PhD thesis, University of
Stuttgart, 2004.
[80] R. Knopp and P. A. Humblet, “Information capacity and power control in
single-cell multiuser communications,” Proceedings IEEE ICC, vol. 1, pp. 331–
335, June 1995.
[81] R. Knopp and P. A. Humblet, “On coding for block fading channels,” IEEE
Transactions on Information Theory, vol. 46, no. 1, pp. 189–205, 2000.
[82] A. Knutson and T. Tao, “Honeycombs and sums of hermitian matriceso,”
Notices of the American Mathematical Society, February 2001.
[83] F. Kraus, “Über konvexe Matrixfunktionen,” Mathematische Zeitschrift,
vol. 41, pp. 18–42, 1936.
[84] F. Kubo and T. Ando, “Means of positive linear operators,” Mathematique
Annalen, vol. 246, pp. 205–224, 1980.
[85] T. A. Lamahewa, R. A. Kennedy, T. D. Abhayapala, and T. Betlehem,
“MIMO channel correlation in general scattering environments,” Proceedings
Australian Communication Theory Workshop, 2006.
[86] E. G. Larsson and P. Stoica, Space-Time Block Coding for Wireless Commu-
nications. Cambridge University Press, 2003.
[87] X. B. Liang, “Orthogonal designs with maximal rates,” IEEE Transactions
on Information Theory, vol. 49, no. 10, pp. 2468–2503, October 2003.
[88] K. T. Löwner, “Über monotone Matrixfunktionen,” Mathematische
Zeitschrift, vol. 38, pp. 177–216, 1934.
[89] A. Lozano, A. M. Tulino, and S. Verdú, “Multiple-antenna capacity in the
low-power regime,” IEEE Transactions on Information Theory, vol. 49, no. 10,
pp. 2527–2544, October 2003.
[90] A. Lozano, A. M. Tulino, and S. Verdu, “Optimum power allocation for paral-
lel Gaussian channels with arbitrary input distributions,” IEEE Transactions
on Information Theory, vol. 52, no. 7, pp. 3033–3051, July 2006.
[91] Z.-Q. Luo and W. Yu, “An introduction to convex optimization for commu-
nications and signal processing,” IEEE Journal on selected areas in commu-
nications, vol. 24, no. 8, pp. 1426–1438, August 2006.
[92] A. W. Marshall and I. Olkin, “Inequalities: Theory of Majorization and Its
Application,” in Mathematics in Science and Engineering, Academic Press,
Inc. (London) Ltd., 1979.
References 149
[93] A. W. Marshall and F. Proschan, “An inequality for convex functions involving
majorization,” Journal of Mathematical Analysis Applications, 1965.
[94] T. L. Marzetta and B. M. Hochwald, “Capacity of a mobile multiple-antenna
communication link in Rayleigh flat fading,” IEEE Transactions on Informa-
tion Theory, vol. 45, no. 1, pp. 139–157, January 1999.
[95] R. J. McEliece and W. E. Stark, “Channels with block interference,” IEEE
Transactions on Information Theory, vol. 30, no. 1, pp. 44–53, January
1984.
[96] R. U. Nabar, H. Bölcskei, V. Erceg, D. Gesbert, and A. J. Paulraj, “Perfor-
mance of multiantenna signaling techniques in the presence of polarization
diversity,” IEEE Transactions on Signal Processing, vol. 50, pp. 2553–2562,
2002.
[97] R. U. Nabar, H. Bölcskei, and A. J. Paulraj, “Outage properties of space-time
block codes in correlated Rayleigh or Ricean fading environments,” IEEE
ICASSP, pp. 2381–2384, May 2002.
[98] R. U. Nabar, H. Bölcskei, and A. J. Paulraj, “Diversity and outage perfor-
mance in Ricean MIMO channels,” IEEE Transactions Wireless Communica-
tions, vol. 4, no. 5, pp. 2519–2532, September 2005.
[99] A. Narula, M. J. Lopez, M. D. Trott, and G. W. Wornell, “Efficient use of
side information in multiple-antenna data transmission over fading channels,”
IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1423–
1436, October 1998.
[100] A. Narula, M. J. Trott, and G. W. Wornell, “Performance limits of coded
diversity methods for transmitter antenna arrays,” IEEE Transactions on
Information Theory, vol. 45, no. 7, pp. 2418–2433, November 1999.
[101] M. A. Nielsen, “Conditions for a class of entanglement transformations,” Phys-
ical Review Letters, vol. 83, pp. 436–439, 1999.
[102] H. Özcelik, Indoor MIMO Channel Models. PhD thesis, Technische Universität
Wien, 2004.
[103] H. Özcelik, M. Herdin, W. Weichselberger, J. Wallace, and E. Bonek, “Defi-
ciencies of the kronecker MIMO radio channel model,” IEE Electronics Letters,
vol. 39, no. 16, pp. 1209–1210, 2003.
[104] D. P. Palomar, A unified framework for communications through MIMO chan-
nels. PhD thesis, Universitat Politécnica de Catalunya, 2003.
[105] D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, “Joint TX-RX beamforming
design for multicarrier MIMO channels: A unified framework for convex opti-
mization,” IEEE Transactions on Signal Processing, vol. 51, no. 9, pp. 2381–
2401, September 2003.
[106] D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, “Uniform power allocation in
MIMO channels: A game-theoretic approach,” IEEE Transactions on Infor-
mation Theory, vol. 49, no. 7, pp. 1707–1727, July 2003.
[107] D. P. Palomar and Y. Jiang, “MIMO transceiver design via majorization the-
ory,” Foundations and Trends in Communications and Information Theory,
vol. 3, no. 4–5, pp. 331–551, 2007.
[108] D. P. Palomar and S. Verdú, “Gradient of mutual information in linear vector
Gaussian channels,” IEEE Transactions on Information Theory, vol. 52, no. 1,
pp. 141–154, January 2006.
150 References
[109] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Com-

munications. Cambridge University Press, 2003.
[110] J. G. Proakis, Digital Communications. New York: McGraw-Hill, Forth ed.,
2000.
[111] F. Rashid-Farrokhi, K. J. R. Liu, and L. Tassiulas, “Transmit beamforming
and power control for cellular wireless systems,” IEEE Journal on Selected
Areas in Communications, vol. 16, no. 8, pp. 1437–1450, October 1998.
[112] W. Rhee and J. M. Cioffi, “Ergodic capacity of multi-antenna Gaussian
multiple-access channels,” Proceedings of the 35th Asilomar Conference on
Signals, Systems and Computers, vol. 1, pp. 507–512, November 2001.
[113] W. Rhee and J. M. Cioffi, “On the capacity of multiuser wireless systems
with multiple antennas,” IEEE Transactions on Information Theory, vol. 49,
no. 10, pp. 2580–2595, October 2003.
[114] S. M. Robinson, “First order conditions for general nonlinear optimization,”
SIAM Journal of Applied Mathematics, vol. 30, no. 4, pp. 597–607, June 1976.
[115] A. Roger Hammons Jr and H. E. Gamal, “On the theory of space-time codes
for PSK modulation,” IEEE Transactions on Information Theory, vol. 46,
no. 2, pp. 524–542, March 2000.
[116] M. Sagae and K. Tanabe, “Upper and lower bounds for the arithmetic-
geometric-harmonic means of positive definite matrices,” Linear and Multi-
linear Algebra, vol. 37, pp. 279–282, 1994.
[117] H. Sampath, P. Stoica, and A. Paulraj, “Generalized linear precoder and
decoder design for MIMO channels using the weighted MMSE criterion,” IEEE
Transactions on Communications, vol. 49, no. 12, pp. 2198–2206, December
2001.
[118] S. Sandhu and A. J. Paulraj, “Space-time block codes: A capacity perspec-
tive,” IEEE Communications Letters, vol. 4, no. 12, pp. 384–386, December
2000.
[119] H. Sato, “An outer bound to the capacity region of broadcast channels,”
IEEE Transaction on Information Theory, vol. 24, no. 3, pp. 374–376, May
1978.
[120] A. Scaglione, P. Stoica, S. Barbarossa, G. B. Giannakis, and H. Sampath,
“Optimal designs for space-time linear precoders and decoders,” IEEE Trans-
actions on Signal Processing, vol. 50, no. 5, pp. 1051–1064, May 2002.
[121] M. Schubert and H. Boche, “Solvability of coupled downlink beamform-
ing problems,” Global Telecommunications Conference, vol. 1, pp. 614–618,
November 2001.
[122] M. Schubert and H. Boche, “QoS-based resource allocation and transceiver
optimization,” in Foundations and Trends in Communications and Informa-
tion Theory, vol. 2, Now publishers, 2005.
[123] M. Schubert, S. Shi, and H. Boche, “Transceiver optimization for linear mul-
tiuser MIMO channels: Sum power minimization with per-user mmse require-
ments,” in Proceedings of EUSIPCO, 2006.
[124] A. Sezgin, E. A. Jorswieck, and H. Boche, “Performance optimization of open-
loop MIMO systems with orthogonal space-time block codes,” IEEE Signal
Processing Letters, vol. 14, no. 1, pp. 13–16, January 2007.
References 151
[125] A. Sezgin and T. J. Oechtering, “On the outage probability of quasi-orthogonal

space-time codes,” in Proceedings of IEEE Information Theory Workshop
2004, San Antonio, TX, October 2004.
[126] S. Shi and M. Schubert, “Mmse transmit optimization for multiuser multi-
antenna systems,” in Proceedings of IEEE ICASSP, 2005.
[127] S. Shi, M. Schubert, and H. Boche, “Computational efficient transceiver opti-
mization for multiuser MIMO systems: Power minimization with user-mmse
requirements,” in Proceedings Asilomar, 2006.
[128] S. Shi, M. Schubert, and H. Boche, “Downlink mmse transceiver optimiza-
tion for multiuser MIMO systems: Duality and sum-mse minimization,” IEEE
Transactions on Signal Processing, (will appear), 2007.
[129] S. Shi, M. Schubert, E. A. Jorswieck, and H. Boche, “Downlink sum-MSE
transceiver optimization for linear multi-user MIMO systems,” in Proceedings
of Asilomar Conference on Signal, Systems, and Computers, 2005.
[130] H. Shin and J. H. Lee, “Exact symbol error probability of orthogonal
space-time block codes,” in Proceedings of IEEE Globecom, pp. 1547–1552,
November 2002.
[131] D. Shiu, G. J. Foschini, M. J. Gans, and J. M. Kahn, “Fading correlation and
its effect on the capacity of multi-element antenna systems,” IEEE Transac-
tions on Communications, vol. 48, no. 3, pp. 502–513, March 2000.
[132] P. Stoica, Y. Jiang, and J. Li, “On MIMO channel capacity: An intu-
itive discussion,” IEEE Signal Proccessing Magazine, vol. 5, pp. 83–84, May
2005.
[133] G. J. Székely and N. K. Barikov, “Extremal probabilities for Gaussian
quadratic forms,” Probabilistic Theory Related Fields, vol. 126, pp. 184–202,
April 2003.
[134] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block codes
from orthogonal design,” IEEE Transactions on Information Theory, vol. 45,
no. 5, pp. 1456–1467, July 1999.
[135] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block coding
for wireless communications: Performance results,” IEEE Journal on Selected
Areas in Communications, vol. 17, no. 3, pp. 451–460, March 1999.
[136] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high
data rate wireless communication: Performance analysis and code construc-
tion,” IEEE Transaction on Information Theory, vol. 44, no. 3, pp. 774–765,
March 1998.
[137] E. Telatar, “Capacity of multi-antenna Gaussian channels,” European Trans-
actions on Telecommunications, vol. 10, no. 6, pp. 585–595, November/
December 1999.
[138] O. Tirkkonen and A. Hottinen, “Square-matrix embeddable space-time block
codes for complex signal constellations,” IEEE Transactions on Information
Theory, vol. 48, no. 2, pp. 1122–1126, February 2002.
[139] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cam-
bridge University Press, 2005.
[140] D. N. Tse and P. Viswanath, “Downlink-uplink duality and effective band-
widths,” IEEE International Symposium on Information Theory, p. 52, 2002.
152 References
[141] A. Tulino, A. Lozano, and S. Verdú, “Impact of antenna correlation on the

capacity of multiantenna channels,” IEEE Transactions on Information The-
ory, vol. 51, no. 7, pp. 2491–2509, July 2005.
[142] A. M. Tulino, A. Lozano, and S. Verdú, “Multiantenna channels: Capacity,
coding and signal processing,” Bandwidth-Power Tradeoff of Multi-antenna
Systems in the Low-Power Regime, DIMACS: Series in Discrete Mathematics
and Theoretical Computer Science, 2003.
[143] S. Verdú, “Spectral efficiency in the wideband regime,” IEEE Transactions
on Information Theory, vol. 48, no. 6, pp. 1319–1343, June 2002.
[144] A. Vielmon, Y. Li, and J. R. Barry, “Performance of Alamouti transmit diver-
sity over time-varying rayleigh-fading channels,” Proceedings of IEEE WCNC,
2005.
[145] S. Vishwanath, S. Boyd, and A. Goldsmith, “Worst-case capacity of Gaussian
vector channels,” in Proocedings of IEEE Canadian Workshop on Information
Theory, 2003.
[146] S. Vishwanath, N. Jindal, and A. Goldsmith, “On the capacity of multiple
input multiple output broadcast channels,” Proceedings of International Con-
ference on Communications, vol. 3, pp. 1444–1450, April 2002.
[147] S. Vishwanath, N. Jindal, and A. Goldsmith, “Duality, achievable rates and
sum-rate capacity of Gaussian MIMO broadcast channels,” IEEE Transac-
tions on Information Theory, vol. 49, no. 10, pp. 2658–2668, October 2003.
[148] S. Vishwanath, G. Kramer, S. Shamai (Shitz), S. Jafar, and A. Goldsmith,
“Capacity bounds for Gaussian vector broadcast channels,” DIMACS Series
in Discrete Mathematics and Theoretical Computer Science, 2003.
[149] E. Visotsky and U. Madhow, “Space-time transmit precoding with imperfect
feedback,” IEEE Transactions on Information Theory, vol. 47, no. 6, pp. 2632–
2639, September 2001.
[150] P. Viswanath, V. Anantharam, and D. N. C. Tse, “Optimal sequences, power
control and user capacity of synchronous CDMA systems with linear MMSE
multiuser receivers,” IEEE Transactions on Information Theory, vol. 45, no. 6,
pp. 1968–1983, September 1999.
[151] P. Viswanath, D. Tse, and R. Laroia, “Opportunistic beamforming using dumb
antennas,” IEEE Transactions on Information Theory, vol. 48, no. 6, pp. 1277–
1294, June 2002.
[152] P. Viswanath and D. N. C. Tse, “Sum capacity of the vector Gaussian broad-
cast channel and uplink-downlink duality,” IEEE Transactions on Information
Theory, vol. 49, no. 8, pp. 1912–1921, August 2003.
[153] H. Weingarten, Y. Steinberg, and S. Shamai (Shitz), “The Capacity Region
of the Gaussian MIMO Broadcast Channel,” Proceedings of Conference on
Computer Systems and Sciences, pp. 7–12, 2004.
[154] K. Werner, M. Jansson, and P. Stoica, “On estimation of kronecker structured
covariance matrices,” IEEE Transactions on Signal Processing, (will appear),
2006.
[155] D. V. Widder, The Laplace Transform. Princeton University Press, 1941.
References 153
[156] E. P. Wigner and J. von Neumann, “Significance of löwner’s theorem in the

quantum theory of collisions,” Annals of Mathematics, vol. 59, pp. 418–433,
1954.
[157] J. H. Winters, “The diversity gain of transmit diversity in wireless systems
with Rayleigh fading,” IEEE Transactions on Vehicular Technology, vol. 47,
no. 1, pp. 119–123, February 1998.
[158] P. Wolniansky, G. Foschini, G. Golden, and R. Valenzuela, “V-BLAST: An
architecture for realizing very high data rates over the rich-scattering wireless
channel,” in Proceedings URSI International Symposium on Signals, Systems,
and Electronics, (IEEE, New York, NY), pp. 295–300, 1998.
[159] G. Wornell, “Linear diversity techniques for fading channels,” in Wireless
Communications, Signal Processing Perspectives, (V. Poor and G. Wornell,
eds.), Chapter 1, 1998.
[160] K. Yu, Multiple-Input Multiple-Output Radio Propagation Channels: Charac-
teristics and Models. PhD thesis, Royal Institute of Technology, Stockholm,
Sweden, 2005.
[161] W. Yu, “The structure of the worst-noise in Gaussian vector broadcast chan-
nels,” American Mathematical Society – Discrete Mathematics and Theoretical
Computer Science (DIMACS) series on Network Information Theory, 2003.
[162] W. Yu, “Uplink-downlink duality via minimax duality,” IEEE Transactions
on Information Theory, vol. 52, no. 2, pp. 361–374, February 2006.
[163] W. Yu and J. M. Cioffi, “Sum capacity of gaussian vector broadcast channels,”
will appear in IEEE Transactions on Information Theory, November 2004.
[164] W. Yu, W. Rhee, S. Boyd, and J. M. Cioffi, “Iterative water-filling for Gaussian
vector multiple-access channels,” IEEE Transactions on Information Theory,
vol. 50, no. 1, pp. 145–151, January 2004.
[165] P. Zetterberg, Mobile Cellular Communications with Base Station Antenna
Arrays: Spectrum Efficiency, Algorithms and Propagation Models. PhD thesis,
Royal Institute of Technology, Stockholm, 1997.
[166] X. Zhan, Matrix Inequalities, vol. 1790 of Lecture Notes in Mathematics.
Springer-Verlag Berlin Heidelberg, 2002.
[167] X. Zhang, E. Jorswieck, B. Ottersten, and A. Paulraj, “MSE based optimiza-
tion of multiuser MIMO MAC with partial CSI,” in Proceedings of ASILO-
MAR CSSC, 2006.
[168] X.-M. Zhang, “Optimization of Schur-convex functions,” Mathematical
Inequalities and Applicaitons, vol. 1, no. 3, pp. 319–330, 1998.
[169] S. Zhou and G. B. Giannakis, “Optimal transmitter eigen-beamforming and
space-time block coding based on channel mean feedback,” IEEE Transactions
on Signal Processing, vol. 50, pp. 2599–2613, October 2002.
[170] S. Zhou and G. B. Giannakis, “Adaptive modulation for multi-antenna trans-
missions with channel mean feedback,” IEEE Transactions on Wireless Com-
munications, vol. 3, no. 5, pp. 1626–1636, 2004.

Majorization and Matrix-Monotone Functions in Wireless Communications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Majorization and Matrix-Monotone Functions in Wireless Communications

Uploaded by

Copyright:

Available Formats

CITv3n6.

qxd 6/20/2007 12:36 PM Page 1

FnT CIT 3:6 Majorization and Matrix-Monotone Functions

Majorization Theory and Matrix-Monotone Functions in Wireless Communications, reviews the

Eduard Jorswieck and Holger Boche

This book is originally published as

Published, sold and distributed by:

Outside North America:

Venkat Anantharam (UC. Berkeley) Amos Lapidoth (ETH Zurich)

Foundations and Trends R

• Coded modulation • Multiuser detection

Information for Librarians

Majorization and Matrix-Monotone Functions

Eduard Jorswieck1 and Holger Boche2,3,4

3.1 Definition and Examples 33

4 Application of Majorization in Wireless

This short tutorial presents two mathematical techniques namely

1.1 Majorization Theory

1.2 Matrix-Monotone Functions

for any pair of self-adjoint n × n matrices X and Y with eigenvalues

with a positive measure µ ∈ [0, ∞) and real constants a, b ≥ 0.

information1 for the vector model y = Hx + n between x and y for

f (HQH H ) = I(x; y) = log det I + HQH H .

The mutual information denoted as the function f (HQH H ) =

tion f (t) = log(1 + t) which has the integral representation

1.3 Classification and Organization

1 Without any operational meaning for the moment.

PhD thesis [104] provides a framework for optimization of linear MIMO

(a) the partial order on vectors and matrices,

(a) spatial correlation in multiple antenna systems,

results from “Matrix-Monotone Functions” are used to optimize the

A total order is a binary relation on a set X . It is antisymmetric, tran-

2.1 Definition and Examples

Definition 2.1 (Majorization for vectors). For two vectors x, y ∈

Example 2.1. Assume the situation in Figure (2.1). We have two

Fig. 2.1 Example vectors: λA  λB .

which is introduced in Definition 2.1, the vector in Scenario A majorizes

Remark 2.1. Note that sometimes the definition of majorization is

 and  will be used.

Example 2.2. The following vectors can be compared using

Example 2.3. Since Majorization provides only a partial order, there

Closely related to Majorization are doubly stochastic matrices.

Definition 2.2 (Doubly stochastic matrix). An n × n matrix P

Related to the two dimensional

Theorem 2.1(Birkhoff 1946). The permutation matrices constitute

Theorem 2.2 (Representation of doubly stochastic matrices).

Theorem 2.3 (Majorization and doubly stochastic matrices).

In order to illustrate this relationship the partial order itself, we provide

The following concept is used to compare vectors with different `1 norm.

Definition 2.3 (Weak majorization). For x, y ∈ Rn with ordered

For x, y ∈ Rn with ordered components x1 ≥ x2 ≥ · · · ≥ xn and y1 ≥

The connection to doubly stochastic matrices is provided after the next

Definition 2.4 (Substochastic and superstochastic matrix).

with superstochastic matrix P 1 and substochastic matrix P 2 .

Another type of majorization is defined next.

Definition 2.5 (Log-majorization). Assume x = [x1 , . . . , xn ] and

then x is log-majorized by y, i.e., y log x.

Log-marjoziation is stronger than majorization. This is described in the

Theorem 2.4 (Theorem 2.7 in [166]). Let the components of x,

2.1.2 Schur-convexity and Schur-concavity

Definition 2.6 (Schur-convex/Schur-concave). A real-valued

Similarly, Φ is said to be Schur-concave on A if

Next, consider one example of a Schur-convex function.

Fig. 2.1 Example vectors: λA λB .

and will be used.

then x is log-majorized by y, i.e., y log x.

function Φ2 is Schur-convex on Rn+ , i.e., if x y ⇒ Φ2 (x) ≥ Φ2 (y).

is Schur-convex on I n . Consequently, x y on I n implies φ(x) ≥ φ(y).

hence λ [H1 1, . . . , Hn n].

(1) lim f (x)(P (x) + 1 ) = c1 < ∞,

(2) lim f (x)(P (x) + 1 ) = c2 < ∞, and