A1 Matrix-Algebra Overview

Appendix A: Matrix Algebra
As a two-dimensional array we define a quadratic and rectangular matrix. First,

we review matrix algebra with respect to two inner and one external relation,
namely multiplication of a matrix by a scalar, addition of matrices of the same
order, matrix multiplication of type Cayley, Kronecker-Zehfuss, Khatri-Rao and
Hadamard. Second, we introduce special matrices of type symmetric, antisym-
metric, diagonal, unity, null, idempotent, normal, orthogonal, orthonormal (spe-
cial facts of representing a 2×2 orthonormal matrix, a general n×n orthonormal
matrix, the Helmert representation of an orthonormal matrix with examples,
special facts about the representation of a Hankel matrix with examples, the
definition of a Vandermonde matrix), the permutation matrix, the commutation
matrix. Third, scalar measures like rank, determinant, trace and norm. In detail,
we review the Inverse Partitional Matrix /IPM/ and the Cayley inverse of the
sum of two matrices. We summarize the notion of a division algebra. A special
paragraph is devoted to vector-valued matrix forms like vec, vech and veck.
Fifth, we introduce the notion of eigenvalue-eigenvector decomposition (analysis
versus synthesis) and the singular value decomposition. Sixth, we give details of
generalized inverse, namely g-inverse, reflexive g-inverse, reflexive symmetric g-
inverse, pseudo inverse, Zlobec formula, Bjerhammar formula, rank factoriza-
tion, left and right inverse, projections, bordering, singular value representation
and the theory solving linear equations.
A1 Matrix-Algebra
A matrix is a rectangular or a quadratic array of numbers,
 a11 a12 ... a1m 1 a1m 

a a22 ... a2 m 1 a2 m 
 21
A : [aij ]   ... ... ... ... ...  , aij  ,[aij ]   n m .
 
 an 11 an 12 ... an 1 m 1 an 1m 
 an1 an 2 ... anm 1 anm 
The format or “order” of A is given by the number n of rows and the number of
the columns,
O( A) : n  m.
Fact:
Two matrices are identical if they have identical format and if at each
place (i, j) are identical numbers, namely
 i  {1,..., n}
A  B  aij  bij 
 j  {1,..., m}.
With permission from the author: Extract from "Erik W. Grafarend: Linear and
nonlinear models. Fixed effects, random effects, and mixed models"
A1 Matrix-Algebra 486
Beside the identity of two matrices the transpose of an m  n matrix A  [aij ] is

the m  n matrix Α  [a ji ] whose ij element is a ji .
Fact:
( A)  A.
A matrix algebra is defined by the following operations:

 multiplication of a matrix by a scalar (external relation)
 addition of two matrices of the same order (internal relation)
 multiplication of two matrices (internal relation)
Definition (matrix additions and multiplications):

(1) Multiplication by a scalar
Α  [aij ],      A  A  [ aij ].
(2) Addition of two matrices of the same order

A  [aij ], B  [bij ]  A  B : [aij  bij ]
A  B  B  A (commutativity)
(A  B)  C  A  (B  C) (associativity)
A  B  A  ( 1)B (inverse addition).
Compatibility
(   ) A   A   A 
distributivity
 ( A  B)   A   B 
( A  B)  A  B.
(3) Multiplication of matrices

3(i) “Cayley-product” (“matrix-product”)
 A  [aij ], O( A)  n  l 
 
 B  [bij ], O(B)  l  m 
l
 C : A  B  [cij ] :  aik bkl , O (C)  n  m
k 1
3(ii) “Kronecker-Zehfuss-product”
A  [aij ], O ( A )  n  m 

B  [bij ], O (B)  k  l 
 C : B  A  [cij ], B  A : [bij A ], O (C)  O (B  A )  kn  l
3(iii) “Khatri-Rao-product”
(of two rectangular matrices of identical column number)
A  [a1 ,..., am ], O ( A )  n  m 

B  [b1 ,..., bm ], O (B)  k  m 
 C : B  A : [b1  a1 , , bm  am ], O (C)  kn  m
3(iv) “Hadamard-product”
(of two rectangular matrices of the same order; elementwise product)
G  [ gij ], O (G )  n  m 

H =[hij ], O (H )  n  m 
 K : G  H  [kij ], kij : gij hij , O (K )  n  m .
The existence of the product A  B does not imply the existence of the product
B  A . If both products exist, they are in general not equal. Two quadratic matri-
ces A and B, for which holds A  B = B  A , are called commutative.
Laws
(i) (A  B)  C  A  (B  C)
A  ( B  C)  A  B  A  C
( A  B)  C  A  C  B  C
( A  B )  ( B  A ) .
(ii)
( A  B )  C  A  ( B  C)  A  B  C
( A  B )  C  ( A  B )  ( B  C)
A  ( B  C)  ( A  B )  ( A  C )
( A  B )  ( C  D )  ( A  C)  ( B  D )
( A  B )  A   B .
(iii)
( A  B )  C  A  ( B  C)  A  B  C
( A  B )  C  ( A  C)  ( B  C)
A  ( B  C)  ( A  B )  ( A  C )
( A  C)  (B  D)  ( A  B)  (C  D)
A  (B  D)  ( A  B)  D, if d ij  0 for i  j.
The transported Khatri-Rao-product generates a row product which we do

not follow here.
(iv) A B  B A
( A  B )  C  A  ( B  C)  A  B  C
( A  B )  C  ( A  C)  ( B  C)
( A1  B1  C1 )  ( A 2  B 2  C2 )  ( A1  A 2 )  ( B1  B 2 )  (C1  C2 )
(D  A)  (B  D)  D  ( A  B)  D, if dij  0 for i  j
( A  B)  A  B.
A2 Special Matrices
We will collect special matrices of symmetric, antisymmetric, diagonal, unity,
zero, idempotent, normal, orthogonal, orthonormal, positive-definite and posi-
tive-semidefinite, special orthonormal matrices, for instance of type Helmert or
of type Hankel.
Definitions (special matrices):

A quadratic matrix A  [aij ] of the order O( A)  n  n is called
symmetric  aij  a ji i, j  {1,..., n} : A  A 
antisymmetric  aij   a ji i, j  {1,..., n} : A   A 

 aij  0  i  j ,
diagonal
A  Diag[a11 ,..., ann ]
 aij  0  i  j
unity  I n n  
 aij  1  i  j
zero matrix 0 n n : aij  0  i, j  {1,..., n}
upper 
 aij  0 i  j
 triangular:  a  0 i  j
lower   ij
idempotent if and only if A  A  A

normal if and only if A  A  A   A .
Definition (orthogonal matrix) :

The matrix A is called orthogonal if AA  and A A are diagonal matrices.
(The rows and columns of A are orthogonal.)
A2 Special Matrices 489
Definition (orthonormal matrix) :

The matrix A is called orthonormal if AA   A A  I .
(The rows and columns of A are orthonormal.)
Facts (representation of a 22 orthonormal matrix) X  SO ( 2) :
A 22 orthonormal matrix X  SO(2) is an element of the special
orthogonal group SO(2) defined by
SO(2) : {X   22 | X X  I 2 and det X  1}
x12  x 22  1
x x2 
{X   1   22 x1 x3  x 2 x 4  0 , x1 x 4  x 2 x3  1}
 x3 x 4 
x32  x 42  1
 cos  sin  
X   22 ,   [0, 2 ]
cos  
(i)
 sin 
is a trigonometric representation of X  SO (2) .
 x 1  x 2    22 , x  [1, 1]
(ii) X 
 1  x
2
x 
is an algebraic representation of X  SO(2)

( x112  x122  1, x11 x 21  x12 x 22   x 1  x 2  x 1  x 2  0, x 21
2
 x 22
2
 1) .
 1  x2 2x 
  
X   1 x 1  x 2    2 2 , x  
2
(iii)
 2 x 1 x 
2
 1  x 2 1  x 2 
is called a stereographic projection of X

(stereographic projection of SO(2) ~ 1 onto 1 ).
 0 x
(iv) X  (I 2  S)(I 2  S) 1 , S   ,
 x 0 
where S  S  is a skew matrix (antisymmetric matrix), is called a
Cayley-Lipschitz representation of X  SO ( 2) .
(v) X  SO(2) is a commutative group (“Abel”)

(Example: X1  SO(2) , X 2  SO(2) , then X1 X 2  X 2 X1 )
( SO(n) for n  2 is the only commutative group, SO(n | n  2) is not
“Abel”).
Facts (representation of an nn orthonormal matrix) X  SO(n) :
An nn orthonormal matrix X  SO(n) is an element of the special orthogonal

group SO(n) defined by
SO(n) : {X   nn | XX  I n and det X  1} .
As a differentiable manifold SO(n) inherits a Riemann structure from the ambi-

ent space  n with a Euclidean metric ( vec X   , dim vec X  n ). Any
2 2
n 2
atlas of the special orthogonal group SO(n) has at least four distinct charts and
there is one with exactly four charts. (“minimal atlas”: Lusternik – Schnirelmann
category)
(i) X  (I n  S)(I n  S) 1 ,
where S  S is a skew matrix (antisymmetric matrix), is called a
Cayley-Lipschitz representation of X  SO(n) .
( n! / 2(n  2)! is the number of independent parameters/coordinates of X)
(ii) If each of the matrices R 1 ,  , R k is an nn orthonormal matrix, then
their product
R1R 2  R k 1R k  SO(n)
is an nn orthonormal matrix.
Facts (orthonormal matrix: Helmert representation) :
Let a  [a1 ,  , a n ] represent any row vector such that a i  0 (i {1,  , n}) is
any row vector whose elements are all nonzero. Suppose that we require an
nn orthonormal matrix, one row which is proportional to a . In what follows
one such matrix R is derived.
Let [r1,  , rn ] represent the rows of R and take the first row r1 to be the row of
R that is proportional to a . Take the second row r2 to be proportional to the n-
dimensional row vector
[a1 ,  a12 / a 2 , 0, 0,  , 0], (H2)
the third row r3 proportional to
[a1 , a 2 ,  (a12  a 22 ) / a 3 , 0, 0, , 0] (H3)
and more generally the first through nth rows r1, , rn proportional to
k 1
[a1 , a 2 ,  , a k 1 ,   a i2 / a k , 0, 0,  , 0] (Hn-1)
i 1
for k  {2, , n} ,
respectively confirm to yourself that the n-1 vectors ( H n 1 ) are orthogonal to

each other and to the vector a . In order to obtain explicit expressions for r1,  ,
rn it remains to normalize a and the vectors ( H n 1 ). The Euclidean norm of the
kth of the vectors ( H n 1 ) is
k 1 k 1 k 1 k
{ a i2  ( a i2 ) 2 / a k2 }1 / 2  {( a i2 ) ( a i2 ) / a k2 }1 / 2 .
i 1 i 1 i 1 i 1
Accordingly for the orthonormal vectors r1, , rn we finally find

n
(1st row) r1  [ a i2 ] 1 / 2 (a1 ,  , a n )
i 1
a k2 k 1
a i2
(kth row) rk  [ k 1
] 1 / 2 (a1 , a 2 , , a k 1 ,   , 0, 0, , 0)
i 1 a k
k
( a i2 ) ( a i2 ).
i 1 i 1
a n2 n 1
a i2
(nth row) rn  [ n 1
] 1 / 2 [a1 , a 2 ,  , a n 1 ,   ] .
i 1 a n
n
( a i2 ) ( a i2 ).
i 1 i 1
The recipy is complicated: When a  [1, 1,  ,1, 1] , the Helmert factors in the
1st row, …, kth row,…, nth row simplify to
r1  n 1 / 2 [1, 1,  ,1, 1]   n
rk  [k (k  1)]1 / 2 [1, 1,  ,1, 1  k , 0, 0,  , 0, 0]   n
rn  [ n( n  1)]
1/ 2
[1, 1,  ,1, 1  n]   .
n
The orthonormal matrix
 r1 
 r 
 2 

 
rk1   SO(n)
 rk 
 

r  
 n 1 
 rn 
is known as the Helmert matrix of order n. (Alternatively the transposes of such a
matrix are called the Helmert matrix.)
Example (Helmert matrix of order 3):

1/ 3 1/ 3 1/ 3 
 
1/ 2 1/ 2 0   SO(3).
 
1/ 6 1/ 6 2 / 6 
Check that the rows are orthogonal and normalized.
Example (Helmert matrix of order 4):
 1/ 2 1/ 2 1/ 2 1/ 2 
 
 1/ 2 1/ 2 0 0 
   SO(4).
 1/ 6 1/ 6 2 / 6 0 
1/ 12 3 / 12 
 1/ 12 1/ 12
Check that the rows are orthogonal and normalized.

Example (Helmert matrix of order n):
 1/ n 1/ n 1/ n 1/ n  1/ n 1/ n 
 
 1/ 2 1/ 2 0 0  0 0 
 
 1/ 6 1/ 6 2/ 6 0  0 0 
   
   SO(n).
 1 1 1 1 (n 1) 
 (n 1)(n  2)   0 
 (n 1)(n  2) (n 1)(n  2) (n 1)(n  2) 
 1 1 1 1 1 n 
   
 n(n 1) n(n 1) n(n 1) n(n 1) n(n 1) 
Check that the rows are orthogonal and normalized. An example is the nth row
(1  n ) n 1 1  2n  n
2 2
1 1
     
n ( n  1) n( n  1) n ( n  1) n ( n  1) n ( n  1)
n n n( n  1)
2
   1,
n( n  1) n( n  1)
where (n-1) terms 1/[n(n-1)] have to be summed.
Definition (orthogonal matrix) :

A rectangular matrix A  [aij ]   n m is called
“a Hankel matrix” if the n+m-1 distinct elements of A ,
 a11 
a 
 21 
  
 
 an 11 
 an1 an 2  anm 
only appear in the first column and last row.
Example: Hankel matrix of power sums

Let A   n m be a nm rectangular matrix ( n  m ) whose entries are power
sums.
 n n n

   i xi  x   x
2 m
i i i i
 i 1 i 1 i 1

 n n n
m 1 
 x2
A :  i 1 i i
 x 3
i i    i xi 
i 1 i 1 
     
 n n n 
  xn    i xin  m 1 
   x n 1
i 1
i i
i 1
i i
i 1

A is a Hankel matrix.
Definition (Vandermonde matrix):
Vandermonde matrix: V   nn
 1 1  1 
 x x2  xn 
V :  1    ,
 x1n 1 n 1
x2 n 1 
 xn 
n
det V   ( xi  x j ).
i, j
i j
Example: Vandermonde matrix V   33
1 1 1
V :  x1 x2 x3  , det V  ( x2  x1 )( x3  x2 )( x3  x1 ).
 x12 x22 x32 
Example: Submatrix of a Hankel matrix of power sums

Consider the submatrix P  [a1 , a2 , , an ] of the Hankel matrix A   n m (n  m)
whose entries are power sums. The determinant of the power sums matrix P is
n n
det P  (  i )( xi )(det V ) 2 ,
i 1 i 1
where det V is the Vandermonde determinant.

Example: Submatrix P   33 of a 34 Hankel matrix of power sums (n=3,m=4)
A
 1 x1   2 x2   3 x3  x   x   x
2
1 1
2
2 2
2
3 3 1 x13   2 x23   3 x33 1 x14   2 x24   3 x34 
 2 
1 x1   2 x2   3 x3  x   x   x 1 x14   2 x24   3 x34 1 x15   2 x25   3 x35 
2 2 3 3 3
1 1 2 2 3 3
1 x1   2 x2   3 x3  x   x   x
3 3 3 4 4 4
1 x15   2 x25   3 x35 1 x16   2 x26   3 x36 
 1 1 2 2 3 3
P  [a1 , a2 , a3 ]
 1 x1   2 x2   3 x3 1 x12   2 x22   3 x32 1 x13   2 x23   3 x33 
 2 4
1 x1   2 x2   3 x3 1 x1   2 x2   3 x3 1 x1   2 x2   3 x3  .
2 2 3 3 3 4 4
1 x13   2 x23   3 x33 1 x14   2 x24   3 x34 1 x15   2 x25   3 x35 
 
Definitions (positive definite and positive semidefinite matrices)

A matrix A is called positive definite, if and only if
xAx  0 x   n , x  0 .
A matrix A is called positive semidefinite, if and only if
xAx  0 x   n .
An example follows.
Example (idempotence):
All idempotent matrices are positive semidefinite, at the time BB and BB
for an arbitrary matrix B .
What are “permutation matrices” or “commutation matrices”? After their defi-
nitions we will give some applications.
Definitions (permutation matrix, commutation matrix)
A matrix is called a permutation matrix if and only if each column of the
matrix A and each row of A has only one element 1 . All other elements
are zero. There holds AA   I .
A matrix is called a commutation matrix, if and only if for a matrix of the
order n 2  n 2 there holds
K  K  and K 2  I n2 .
The commutation matrix is symmetric and orthonormal.
A3 Scalar Measures and Inverse Matrices 495
Example (commutation matrix)
1 0 0 0
0 0 1 0 
n2 K4    K 4 .
0 1 0 0
 
0 0 0 1
A general definition of matrices K nm of the order nm  nm with n  m are to

found in J.R. Magnus and H. Neudecker (1988 p.46-48). This definition does not
lead to a symmetric matrix anymore. Nevertheless is the transpose commutation
matrix again a commutation matrix since we have K nm  K nm and
K nm K mn  I nm .
Example (commutation matrix)

1 0 0 0 0 0
0 0 1 0 0 0
0 0 0 0 1 0
n  2  K 23  0 1 0 0 0 0
m  3 0 0 0 1 0 0
0 0 0 0 0 0
 0 0 0 0 0 1 
1 0 0 0 0 0
0 0 0 1 0 0
n  3   0
K 32  0 1 0 0 0
m  2  0 0 0 0 1 0
0 0 1 0 0 0
 0 0 0 0 0 1 
K 32 K 23  I 6  K 23 K 32 .
A3 Scalar Measures and Inverse Matrices

We will refer to some scalar measures, also called scalar functions, of matrices.
Beforehand we will introduce some classical definitions of type
 linear independence
 column and row rank
 rank identities.
Definitions (linear independence, column and row rank):

A set of vectors x1 , ..., x n is called linear independent if for an arbitrary
linear combination  i 1 i xi  0 only holds if all scalars 1 , ...,  n dis-
n
appear, that is if 1   2  ...   n 1   n  0 holds.
For all vectors which are characterized by x1 ,..., x n unequal from zero are
called linear dependent.
Let A be a rectangular matrix of the order O( Α)  n  m . The column rank
of the matrix A is the largest number of linear independent columns, while
the row rank is the largest number of linear independent rows. Actually the
column rank of the matrix A is identical to its row rank. The rank of a ma-
trix thus is called
rk A .
Obviously,
rk A  min{n, m}.
If rk A  n holds, we say that the matrix A has full row ranks. In contrast if
the rank identity rk A  m holds, we say that the matrix A has full column
rank.
We list the following important rank identities.

Facts (rank identities):
(i) rk A  rk A   rk A A  rk AA 
(ii) rk( A  B )  rk A  rk B
(iii) rk( A  B)  min{rk A, rk B}
(iv) rk( A  B)  rk A if B has full row rank,
(v) rk( A  B )  rk B if A has full column rank.
(vi) rk( A  B  C)  rk B  rk( A  B)  rk( B  C)
(vii) rk( A  B)  (rk A)  (rk B).
If a rectangular matrix of the order O( A)  n  m is fulfilled and, in addition,

Ax  0 holds for a certain vector x  0 , then
rk A  m  1 .
Let us define what is a rank factorization, the column space, a singular matrix
and, especially, what is division algebra.
Facts (rank factorization)
We call a rank factorization
A  GF ,
if rk A  rk G  rk F holds for certain matrices G and F of the order
O(G )  n  rk A and O(F)  rk A  m.
Facts
A matrix A has the column space
 ( A)
formed by the column vectors. The dimension of such a vector space is dim
 ( A)  rk A . In particular,
 ( A)   ( AA)
holds.
Definition (non-singular matrix versus singular matrix)

Let a quadratic matrix A of the order O( A) be given. A is called non-
singular or regular if rk A  n holds. In case rk A  n, the matrix A is
called singular.
Definition (division algebra):

Let the matrices A, B, C be quadratic and non-singular of the order
O( A)  O(B)  O(C)  n  n . In terms of the Cayley-product an inner
relation can be based on
A  [aij ], B  [bij ], C  [cij ], O( A)  O(B)  O(C)  n  n
(i) ( A  B )  C  A  ( B  C) (associativity)
(ii) AI  A (identity)

(iii) A  A 1  I (inverse).
The non-singular matrix A 1  B is called Cayley-inverse. The conditions

A  B  In  B  A  In
are equivalent. The Cayley-inverse A 1 is left and right identical. The Cayley-
inverse is unique.
Fact: ( A 1 )   ( A ) 1 : A is symmetric  A 1 is symmetric.
Facts: (Inverse Partitional Matrix /IPM/ of a symmetric matrix):
Let the symmetric matrix A be partitioned as
A A 12 
A :  11   A 11 , A 22  A 22 .
, A 11

 A 12 A 22 
Then its Cayley inverse A 1 is symmetric and can be partitioned as well as

1
A A 12 
A 1   11 

 A 12 A 22 
[I  A 11
1
 A 11
A 12 ( A 22  A 12 1
 ]A 11
A 12 ) 1 A 12 1 1
 A 11  A 11
A 12 ( A 22  A 12 1
A 12 ) 1 
 ,
  ( A 22  A 12 A 11
1
 A 11
A 12 ) 1 A 12 1
( A 22  A 12 A 11
1
A 12 ) 1 
1
if A 11 exists ,
1
1 A A 12 
A   11 

 A 12 A 22 
 ( A 11  A 12 A 221 A 12 ) 1  ( A 11  A 12 A 221 A 12 ) 1 A 12 A 221 

 1 
,
  A 1
A
22 12
 ( A 11  A 
12 A 1
22 A 12 ) 1
[ I  A 1
A
22 12
 ( A 11  A 12 A 1
A
22 12
 ) 1
A 12 ]A 22 
1
if A 22 exists .
 A 11
S 11 : A 22  A 12 1
 A 22
A 12 and S 22 : A 11  A 12 1
A 12
are the minors determined by properly chosen rows and columns of the matrix A
called “Schur complements” such that
1
1 A A 12 
A   11 

 A 12 A 22 
(I  A 11
1 1
A 12 S 11  ) A 11
A 12 1 1
 A 11 1
A 12 S 11 
 

1
 A 11
 S 11 A 12 1 1
S 11 
1
if A 11 exists ,
1
A A 12 
A 1   11 

 A 12 A 22 
 S 221  S 221 A 12 A 22
1

 1 
1
 A 22 A 12 S 22
1 1
 S 22 A 12 ]A 22 
[I  A 22 A 12 1
if A 221 exists ,
are representations of the Cayley inverse partitioned matrix A 1 in terms of

“Schur complements”.
The formulae S11 and S 22 were first used by J. Schur (1917). The term “Schur
complements” was introduced by E. Haynsworth (1968). A. Albert (1969) re-
placed the Cayley inverse A 1 by the Moore-Penrose inverse A  . For a survey
we recommend R. W. Cottle (1974), D.V. Oullette (1981) and D. Carlson (1986).
:Proof:
For the proof of the “inverse partitioned matrix” A 1 (Cayley inverse) of the
partitioned matrix A of full rank we apply Gauss elimination (without pivoting).
AA 1  A 1 A  I
A A 12 
A   11   A 11 , A 22  A 22
, A 11

 A 12 A 22 
 A   mm , A   ml
 11 l m
12
l l
 
 A 12 , A 22  
B B 12 
A 1   11   B 11 , B 22  B 22
, B 11

B 12 B 22 
B   mm , B   ml
 11 l m
12
l l
 
 B12 , B 22  
AA 1  A 1 A  I 
  B11A11  B12 A12

 A11B11  A12 B12   Im (1)
A B  A B  B A  B A  0 (2)
 11 12 12 22 11 12 12 22
 B11  A 22 B12
 A12   B12
 A11  B 22 A12
 0 (3)

 B12  A 22 B 22  B12
 A12  A12  B 22 A 22  I l (4).
1
Case (i): A 11 exists
“forward step”
  I m (first left equation:
A11B11  A12 B12 

 1
multiply by  A12 A11 )  
  0 (second right equation) 
 B11  A 22 B12
A12
 B 11  A 12
 A 12  A 11
1
  A 12
A 12 B12  A 11
1

  
 B11  A 22 B 12
A 12  0 
 A B  A 12 B 12   Im
  11 11 
 A 11 A 12 )B 12
( A 22  A 12
1
   A 12
 A 11
1
  ( A 22  A 12
B 12  A 11
1
 A 11
A 12 ) 1 A 12 1
  S 11 A 12
B 12 1
 A 11
1
or
 Im 0   A11 A12   A11 A12 

  A A 1 
I l   A12  A 22  A12 A11 A12 
1
.
 12 11 A 22   0
 A 11
Note the “Schur complement” S 11 : A 22  A 12 1
A 12 .
“backward step”
  Im
A 11B 11  A 12 B12 
1 

   ( A 22  A 12
B12  A 11 A 12 ) A 12
1 1
 A 11 
1
 B11  A 11  )  (I m  B 12 A 12
(I m  A 12 B 12  ) A 11
1
1
B 11  [I m  A 11  A 11
A 12 ( A 22  A 12 1
 ]A 11
A 12 ) 1 A 12 1
1 1 1
 A 11
B 11  A 11  A 11 A 12 S 11 A 12 1
A11B12  A12 B 22  0 (second left equation) 

1
 B 12   A 11 1
A 12 B 22   A 11  A 11
A 12 ( A 22  A 12 1
A 12 ) 1

 A11
B 22  ( A 22  A12 1
A12 ) 1
1
B 22  S11 .
Case (ii): A 221 exists
“forward step”
A11B12  A12 B 22  0 (third right equation) 
 B12  A 22 B 22  I l (fourth left equation:  
A12
1

multiply by  A12 A 22 ) 
A 11B 12  A 12 B 22  0 
 
1
 A 12 A 22  B 12  A 12 B 22   A 12 A 221 
A 12
 A  B  A 22 B 22  I l
  12 12 
1
 )B 12   A 12 A 221
( A 11  A 12 A 22 A 12
1
B 12  ( A 11  A 12 A 22  ) 1 A 12
A 12  A 221
1 1
B 12  S 22 A 12 A 22
or
I m 1
 A 12 A 22   A 11 A 12   A 11  A 12 A 22
1

A 12 0 
     .
0 Il   A 12 A 22   
A 12 A 22 
1
Note the “Schur complement” S 22 : A 11  A 12 A 22  .
A 12
“backward step”
 B12  A 22 B 22  I l
A 12 
 ) A 12 A
B12   ( A 11  A 12 A A 12 1 1 1  
22 22 
1
 B 22  A 22  B12
(I l  A 12  )  (I l  B12
 A 12 ) A 22
1
 ( A 11  A 12 A 221 A 12
B 22  [I l  A 221 A 12  ) 1 A 12 ]A 221
1 1
B 22  A 22  A 22 A 12  S 22 A 12 A 22
1 1
 B 11  A 22 B 12
A 12   0 ( third left equation ) 
   A 22
 B 12 1
 B11   A 22
A 12 1
 ( A 11  A 12 A 22
A 12 1
 ) 1
A 12

B 1 1  ( A 1 1  A 1 2 A 2 21 A 1 2 )  1
B 1 1  S 2 21 .

 , B 22 } in terms of { A11 , A12 , A 21 = A12
The representations { B11 , B12 , B 21  B12  ,
A 22 } have been derived by T. Banachiewicz (1937). Generalizations are referred
to T. Ando (1979), R. A. Brunaldi and H. Schneider (1963), F. Burns, D. Carl-
son, E. Haynsworth and T. Markham (1974), D. Carlson (1980), C. D. Meyer
(1973) and S. K. Mitra (1982), C. K. Li and R. Mathias (2000).
We leave the proof of the following fact as an exercise.
Fact (Inverse Partitioned Matrix /IPM/ of a quadratic matrix):

Let the quadratic matrix A be partitioned as
A A 12 
A :  11
A 22 
.
 A 21
Then its Cayley inverse A 1 can be partitioned as well as
1
A A 12 
A 1   11 
 A 21 A 22 
 A 11
1 1
 A 11 1
A 12 S 11 1
A 21 A 11 1
 A 11 1
A 12 S 11 
 1 1 1 ,
  S 11 A 21 A 11 S 11 
1
if A 11 exists
1
1 A A 12 
A   11 
 A 21 A 22 
 S 221 
 S 221 A 12 A 221
 1 1 1 1 ,
1 1
 A 22 A 21S 22 A 22  A A 21S A 12 A 
22 22 22
1
if A 22 exists
and the “Schur complements” are definded by

1 1
S 11 : A 22  A 21 A 11 A 12 and S 22 : A 11  A 12 A 22 A 21 .
Facts: ( Cayley inverse: sum of two matrices):

(s1) ( A + B) 1  A 1  A 1 ( A 1  B 1 ) 1 A 1
(s2) ( A  B) 1  A 1  A 1 ( A 1  B 1 ) 1 A 1
(s3) ( A  CBD) 1  A 1  A 1 (I  CBDA 1 ) 1 CBDA 1
(s4) ( A  CBD) 1  A 1  A 1 (I  BDA 1C) 1 BDA 1
(s5) ( A  CBD) 1  A 1  A 1CB(I  DA 1CB) 1 DA 1
(s6) ( A  CBD) 1  A 1  A 1CBD(I  A 1CBD) 1 A 1
(s7) ( A  CBD) 1  A 1  A 1CBDA 1 (I  CBDA 1 ) 1
(s8) ( A  CBD) 1  A 1  A 1C(B 1  DA 1C) 1 DA 1
( Sherman-Morrison-Woodbury matrix identity )
(s9) B( AB  C) 1  (I  BC1 A) 1 BC1
(s10) BD( A  CBD) 1  (B 1  DA 1C) 1 DA 1
(Duncan-Guttman matrix identity).
W. J. Duncan (1944) calls (s8) the Sherman-Morrison-Woodbury matrix identity.
If the matrix A is singular consult H. V. Henderson and G. S. Searle (1981), D.
V. Ouellette (1981), W. M. Hager (1989), G. W. Stewart (1977) and K. S. Riedel
(1992). (s10) has been noted by W. J. Duncan (1944) and L. Guttman (1946):
The result is directly derived from the identity
( A  CBD)( A  CBD) 1  I 
 A( A  CBD) 1  CBD( A  CBD) 1  I

( A  CBD) 1  A 1  A 1CBD( A  CBD) 1
A 1  ( A  CBD) 1  A 1CBD( A  CBD) 1

DA 1  D( A  CBD) 1  DA 1CBD( A  CBD) 1
DA 1  (I  DA 1CB)D( A  CBD) 1
DA 1  (B 1  DA 1C)BD( A  CBD) 1
(B 1  DA 1C) 1 DA 1  BD( A  CBD) 1 .

Certain results follow directly from their definitions.
Facts (inverses):
(i) ( A ⋅ B)-1 = B-1 ⋅ A-1
(ii) ( A Ä B)-1 = B-1 Ä A-1
(iii) A positive definite  A-1 positive definite
(iv) ( A Ä B)-1 , ( A * B)-1 and (A-1 * B-1 ) are positive de-
finite, then (A-1 * B-1 ) - ( A * B)-1 is positive semidefi-
nite as well as (A-1 * A ) - I and I - (A-1 * A)-1 .
Facts (rank factorization):
(i) If the n ´ n matrix is symmetric and positive semidefinite,

then its rank factorization is
G 
A   1  G1 G 2  ,
G 2 
where G1 is a lower triangular matrix of the order
O(G1 )  rk A  rk A with
rk G 2  rk A ,
whereas G 2 has the format O(G 2 )  (n  rk A)  rk A. In this
case we speak of a Choleski decomposition.
(ii) In case that the matrix A is positive definite, the matrix

block G 2 is not needed anymore: G1 is uniquely determined.
There holds
A 1  (G11 )G11 .
Beside the rank of a quadratic matrix A of the order O( A)  n  n as the first
scalar measure of a matrix, is its determinant
n
A  
perm
(1) ( j1 ,..., jn )  aiji
i 1
( j1 ,..., jn )
plays a similar role as a second scalar measure. Here the summation is extended
as the summation perm ( j1 , , jn ) over all permutations ( j1 ,..., jn ) of the set of
integer numbers (1, , n) .  ( j1 , , jn ) is the number of permutations which
transform (1, , n) into ( j1 , , jn ) .
Laws (determinant)
(i) |   A |   n  | A | for an arbitrary scalar   
(ii) | A  B || A |  | B |
(iii) | A  B || A |m  | B |n for an arbitrary m  n matrix B
(iv) | A  || A |
1
(v) | (A  A ) || A | if A  A is positive definite
2
(vi) | A 1 || A |1 if A 1 exists
(vii) | A | 0  A is singular ( A 1 does not exist)
(viii) | A | 0 if A is idempotent, A  I
n
(ix) | A |  aii if A is diagonal and a triangular matrix
i 1
n
(x) 0 | A |  aii | A  I | if A is positive definite
i 1
n
(xi) | A |  | B |  | A |  bii | A  B | if A and B are posi-
i 1
tive definite
det A11 det( A 22  A 21 A11
1
A12 )
 m m
 A11 A12   A11   , rk A11  m1 1 1
(xii) A   1
 21 A 22  det A 21 det( A11  A12 A 22 A 21 )

 A   m  m , rkA  m .

2 2
22 22 2
A submatrix of a rectangular matrix A is the result of a canceling procedure of

certain rows and columns of the matrix A. A minor is the determinant of a quad-
ratic submatrix of the matrix A. If the matrix A is a quadratic matrix, to any
element aij there exists a minor being the determinant of a submatrix of the ma-
trix A which is the result of reducing the i-th row and the j-th column. By multi-
plying with ( 1)i  j we gain a new element cij of a matrix C  [cij ] . The trans-
pose matrix C is called the adjoint matrix of the matrix A, written adjA . Its
order is the same as of the matrix A.
Laws (adjoint matrix)

n
(i) | A |  aij cij , i  1, , n
j 1
n
(ii) | A |  a jk c jk , k  1, , n
j 1
(iii) A  (adj A)  (adj A)  A  | A | I

(iv) adj( A  B)  (adj B)  (adj A )
(v) adj( A  B)  (adj A)  (adj B)
(vi) adj A | A | A 1 if A is nonsingular
(vii) adjA positive definitive  A positive definite.
As a third scalar measure of a quadratic matrix A of the order O( A)  n  n we

introduce the trace tr A as the sum of diagonal elements,
n
tr A   aii .
i 1
Laws (trace of a matrix)

(i) tr(  A)    tr A for an arbitrary scalar   
(ii) tr( A  B)  tr A  tr B for an arbitrary n  n matrix B
(iii) tr( A  B)  (tr A)  (tr B) for an arbitrary m  m matrix B
iv) tr A  tr(B  C) for any factorization A = B  C
(v) tr A (B  C)  tr( A   B)C for an arbitrary n  n matrix
B and C
(vi) tr A   tr A
(vii) trA  rkA if A is idempotent
(viii) 0  tr A  tr ( A  I ) if A is positive definite
(ix) tr( A  B)  (trA)  (trB) if A und  are positive semidefi-
nite.
In correspondence to the W – weighted vector (semi) – norm.
|| x ||W  (x W x)1/ 2
is the W – weighted matrix (semi) norm

|| A ||W  (trA WA)1/ 2
for a given positive – (semi) definite matrix W of proper order.
Laws (trace of matrices):
(i) tr AWA  0
(ii) tr A WA  0  WA  0
 A  0 if W is positive definite
A4 Vector-valued Matrix Forms

If A is a rectangular matrix of the order O( A)  n  m , a j its j – th column, then
vec A is an nm  1 vector
 a1 
a 
 2 
vec A     .
 
 an 1 
 an 
In consequence, the operator “vec” of a matrix transforms a vector in such a way
that the columns are stapled one after the other.
Definitions ( vec, vech, veck ):

 a1 
a 
 2 
(i) vec A     .
 
 an 1 
 an 
(ii) Let A be a quadratic symmetric matrix, A  A  , of
order O( A)  n  n . Then vechA (“vec - koef”) is the
[n(n  1) / 2]  1 vector which is the result of row (column) stapels
of those matrix elements which are upper and under of its diago-
nal.
A4 Vector-valued Matrix Forms 507
 a11 
 
 
 an1 
A  [aij ]  [a ji ]  A  vechA :  22  .
a

a 
 n2 
 
 ann 
(iii) Let A be a quadratic, antisymmetric matrix, A  A  , of
order O( A)  n  n . Then veckA (“vec - skew”) is the
[n(n  1) / 2] 1 vector which is generated columnwise stapels of
those matrix elements which are under its diagonal.
 a11 
  
 
 an1 
 a 
A  [aij ]  [a ji ]   A  veckA :  32  .

 a 
 n2 
  
 an, n 1 
Examples
a b c
A  vecA  [a, d , b, e, c, f ]
f 
(i)
d e
a b c
(ii) A   b d e   A  vechA  [a, b, c, d , e, f ]
 c e f 
 0  a b c 
 a 0 d e 
(iii) A   A  veckA  [a, b, c, d , e, f ] .
b d 0 f
 
 c e f 0 
Useful identities, relating to scalar- and vector - valued
measures of matrices will be reported finally.
Facts (vec and trace forms):

(i) vec(A  B  C)  (C  A) vec B
(ii) vec(A  B)  (B  I n ) vec A  (B  A) vec I m 
 (I1  A ) vec B, A   n m , B   m q
508
(iii) A  B  c  (c  A)vecB  ( A  c)vecB, c   q

(iv) tr( A  B)  (vecA )vecB  (vecA )vecB  tr( A  B)
(v) tr(A  B  C  D)  (vec D)(C  A ) vec B 
 (vec D)( A  C) vec B
(vi) K nm  vecA  vecA, A   n m
(vii) K qn (A  B)  (B  A)K pm
(viii) K qn (A  B)K mp  (B  A )
(ix) K qn (A  c)  c  A
(x) K nq (c  A)  A  c, A   nm , B   q p , c   q
(xi) vec(A  B)  (I m  K pn  I q )(vecA  vecB)
(xii) A  (a1 , , a m ), B : Diagb, O(B)  m  m,
m
C  [c1 , , c m ]  vec(A  B  C)  vec[ (a j b j cj )] 
j 1
m
  (c j  a j )b j  [c1  a1 , , c m  a m )]b  (C  A)b
j 1
(xiii) A  [aij ], C  [cij ], B : Diagb, b = [b1 , ,b m ]   m
 tr(A  B  C  B)  (vec B) vec(C  B  A ) 

 b(I m  I m )  ( A  C)b  b( A  C)b
(xiv) B := I m  tr( A  C)  rm ( A  C)rm
( rm is the m  1 summation vector: rm : [1, ,1]   m )
(xv) vec DiagD : (I m  D)rm  [I m  ( A  B  C)]rm 
 (I m  I m )  [I m  ( A   B  C)]  vec DiagI m 

 (I m  I m )  vec( A   B  C) 
 (I m  I m )  (C  A )vecB  (C  A)vecB
when D  A   B  C is factorized.
Facts (Löwner partial ordering):

For any quadratic matrix A   mm there holds the uncertainty
I m  ( A   A)  I m  A  A  I m  [( A  I m )  (I m  A)]
in the Löwner partial ordering that is the difference matrix
I m  ( A  A)  I m  A  A
is at least positive semidefinite.
A5 Eigenvalues and Eigenvectors 509
A5 Eigenvalues and Eigenvectors

To any quadratic matrix A of the order O( A)  m  m there exists an eigenvalue
 as a scalar which makes the matrix A   I m singular. As an equivalent state-
ment, we say that the characteristic equation  I m  A  0 has a zero value
which could be multiple of degrees, if s is the dimension of the related null space
 ( A   I ) . The non-vanishing element x of this null space for which
Ax   x, x  0 holds, is called right eigenvector of A. Related vectors y for
which y A = λy , y  0 , holds, are called left eigenvectors of A and are represen-
tative of the right eigenvectors A’. Eigenvectors always belong to a certain ei-
genvalue and are usually normed in the sense of xx  1, y y  1 as long as they
have real components. As the same time, the eigenvectors which belong to dif-
ferent eigenvalues are always linear independent: They obviously span a sub-
space of  ( A) .
In general, the eigenvalues of a matrix A are complex! There is an important
exception: the orthonormal matrices, also called rotation matrices whose eigen-
values are +1 or, –1 and idempotent matrices which can only be 0 or 1 as a
multiple eigenvalue generally, we call a null eigenvalue a singular matrix.
There is the special case of a symmetric matrix A = A  of order O( A)  m  m . It
can be shown that all roots of the characteristic polynomial are real numbers
and accordingly m - not necessary different - real eigenvalues exist. In addition,
the different eigenvalues  and  and their corresponding eigenvectors x and y
are orthogonal, that is
(   )x  y  (x  A )  y  x( A  y )  0,     0.
In case that the eigenvalue  of degrees s appears s-times, the eigenspace
 ( A    I m ) is s - dimensional: we can choose s orthonormal eigenvectors
which are orthonormal to all other! In total, we can organize m orthonormal
eigenvectors which span the entire  m . If we restrict ourselves to eigenvectors
and to eigenvalues  ,   0 , we receive the column space  ( A) . The rank of A
coincides with the number of non-vanishing eigenvalues {1 , , r }.
U : [U1 , U 2 ], O(U)  m  m, U  U  U U  I m
U1 : [u1 , , u r ], O(U1 )  m  r , r  rkA
U 2 : [u r 1 , , u m ], O(U 2 )  m  (m  r ), A  U 2  0.
With the definition of the r  r diagonal matrix  : Diag(1 , r ) of non-

vanishing eigenvalues we gain
 0
A  U  A  [U1 , U 2 ]  [U1, 0]  [U1 , U 2 ]  .
 0 0
Due to the orthonormality of the matrix U : [U1 , U 2 ] we achieve the results

about eigenvalue – eigenvector analysis and eigenvalues – eigenvector
synthesis.
Lemma (eigenvalue – eigenvector analysis: decomposition):
Let A  A be a symmetric matrix of the order O( A)  m  m .
Then there exists an orthonormal matrix U in such a way that
UAU  Diag(1 , r , 0, , 0)
holds. (1 , r ) denotes the set of non – vanishing eigenvalues

of A with r  rkA ordered decreasingly.
Lemma (eigenvalue – eigenvectorsynthesis: decomposition):

Let A  A be a symmetric matrix of the order O ( A )  m  m .
Then there exists a synthetic representation of eigenvalues and
eigenvectors of type
A  U  Diag(1 , r , 0, , 0)U   U1U1 .
In the class of symmetric matrices the positive (semi)definite matrices play a

special role. Actually, they are just the positive (nonnegative) eigenvalues
squarerooted.
1/ 2 : Diag( 1 , , r ) .
The matrix A is positive semidefinite if and only if there exists a quadratic

m  m matrix G such that A  GG  holds, for instance, G : [u11/ 2 , 0] . The
quadratic matrix is positive definite if and only if the m  m matrix G is not
singular. Such a representation leads to the rank fatorization A  G1  G1 with
G1 : U1  1/ 2 . In general, we have
Lemma (representation of the matrix U1 ):
If A is a positive semidefinite matrix of the order O( A) with non
– vanishing eigenvalues {1 , , r } , then there exists an m  r
matrix
U1 : G1   1  U1   1/ 2
with
U1  U1  I r ,  (U1 )   (U1 )   ( A),
such that
U1  A  U1  ( 1/ 2
 U1 )  (U1    U1 )  (U1   1/ 2 )  I r .
The synthetic relation of the matrix A is
A  G1  G1  U1   1  U1 .
The pseudoinverse has a peculiar representation if we introduce the matrices
U1 , U1 and  1 .
Definition (pseudoinverse):
If we use the representation of the matrix A of type A  G1  G1 
U1U1 then
A  : U1  U1  U1   1  U1
is the representation of its pseudoinverse namely

(i) AA  A  (U1U1 )(U1 1U1 )(U1U1 )  U1U1
(ii) A  AA   (U1 1U1 )(U1U1 )(U1 1U1 )  U1 1U1  A 
(iii) AA   (U1U1 )(U1 1U1 )  U1U1  ( AA  )
(iv) A  A  (U1 1U1 )(U1U1 )  U1U1  ( A  A ) .
The pseudoinverse A  exists and is unique, even if A is singular. For a
nonsingular matrix A, the matrix A  is identical with A 1 . Indeed, for the case of
the pseudoinverse (or any other generalized inverse) the generalized inverse of a
rectangular matrix exists. The singular value decomposition is an excellent tool
which generalizes the classical eigenvalue – eigenvector decomposition of
symmetric matrices.
Lemma (Singular value decomposition):
(i) Let A be an n  m matrix of rank r : rkA  min(n, m) .
Then the matrices AA and AA are symmetric positive (semi)
definite matrices whose nonvanishing eigenvalues {1 , r } are
positive. Especially
r  rk( AA)  rk( AA)
holds. A A contains 0 as a multiple eigenvalue of degree m  r ,
and AA has the multiple eigenvalue of degree n  r .
(ii) With the support of orthonormal eigenvalues of AA and
AA  we are able to introduce an m  m matrix V and an n  n
matrix U such that UU  U U  I n , VV   V V  I m holds and
UAAU  Diag(12 , , r 2 , 0, , 0),
V A AV  Diag(12 , , r 2 , 0, , 0).
The diagonal matrices on the right side have different formats
m  m and m  n .
(iii) The original n  m matrix A can be decomposed
according to
 0
U AV    , O(UAV )  n  m
 0 0
with the r  r diagonal matrix
 : Diag(1 , , r )
of singular values representing the positive roots of non-

vanishing eigenvalues of AA and AA .
(iv) A synthetic form of the n  m matrix A is
 0
A  U   V .
 0 0
We note here that all transformed matrices of type T1 AT of a quadratic
matrix have the same eigenvalues as A  ( AT)T1 being used as often as
an invariance property.
?what is the relation between eigenvalues and the trace, the determinant,
the rank? The answer will be given now.
Lemma (relation between eigenvalues and other scalar measures):
Let A be a quadratic matrix of the order O( A)  m  m with
eigenvalues in decreasing order. Then we have
m m
| A |   j , trA    j , rkA  trA ,
j 1 j 1
if A is idempotent. If A  A is a symmetric matrix with real

eigenvalues, then we gain
1  max{a jj | j  1, , m},
m  min{a jj | j  1, , m}.
At the end we compute the eigenvalues and eigenvectors which relate the
variation problem xAx  extr subject to the condition xx  1 , namely
xAx   (xx)  extr .
x, 
The eigenvalue  is the Lagrange multiplicator of the optimization problem.
A6 Generalized Inverses 513
A6 Generalized Inverses
Because the inversion by Cayley inversion is only possible for quadratic
nonsingular matrices, we introduce a slightly more general definition in order to
invert arbitrary matrices A of the order O( A)  n  m by so – called generalized
inverses or for short g – inverses.
An m  n matrix G is called g – inverse of the matrix A if it fulfils the equation
AGA  A
in the sense of Cayley multiplication. Such g – inverses always exist and are
unique if and only if A is a nonsingular quadratic matrix. In this case
G  A 1 if A is invertible,
in other cases we use the notation
G  A  if A 1 does not exist.
For the rank of all g – inverses the inequality
r : rk A  rk A   min{n, m}
holds. In reverse, for any even number d in this interval there exists a g – inverse
A  such that
d  rkA   dim  ( A  )
holds. Especially even for a singular quadratic matrix A of the order

O( A)  n  n there exist g-inverses A  of full rank rk A   n . In particular, such
g-inverses A r are of interest which have the same rank compared to the matrix
A, namely
rkA r  r  rkA .
Those reflexive g-inverse A r are equivalent due to the additional condition

A r AA r  A r
but are not necessary symmetric for symmetric matrices A. In general,
A  A and A  g-inverse of A 
 ( A  ) g-inverse of A
 A rs : A  A( A  ) is reflexive symmetric g  inverse of A.

For constructing of A rs we only need an arbitrary g-inverse of A. On the other
side, A rs does not mean unique. There exist certain matrix functions which are
independent of the choice of the g-inverse. For instance,
A ( A A )  A and A ( AA ) 1 A
can be used to generate special g-inverses of AA or AA  . For instance,
A  : ( AA)  A and A m : A ( AA ) 
have the special reproducing properties
A( A A)  A A  AA  A  A
and
AA ( AA )  A  AA m A  A ,
which can be generalized in case that W and S are positive semidefinite matrices
to
WA ( A WA )  A WA  WA
ASA ( ASA )  AS  AS ,
where the matrices
WA ( A WA )  A W and SA ( ASA )  AS
are independent of the choice of the g-inverse ( A WA )  and ( ASA )  .
A beautiful interpretation of the various g-inverses is based on the fact that the
matrices
( AA  )( AA  )  ( AA  A ) A   AA  and ( A  A )( A  A )  A  ( AA  A )  A  A
are idempotent and can therefore be geometrically interpreted as projections. The
image of AA  , namely
 ( AA  )   ( A)  {Ax | x   m }   n ,
can be completed by the projections A  A along the null space

 ( A  A )   ( A )  {x | Ax  0}   m .
By the choice of the g – inverse we are able to choose the projected direction of
AA  and the image of the projections A  A if we take advantage of the
complementary spaces of the subspaces
 ( A  A)   ( A  A)   m and  ( AA  )   ( AA  )   n
by using the symbol " " as the sign of “direct sum” of linear spaces which only
have the zero element in common. Finally we have use the corresponding
dimensions
dim  ( A  A )  r  rkA  dim  ( AA  ) 
dim  ( A  A)  m  rkA  m  r
 
 dim  ( AA )  n  rkA  n  r
independent of the special rank of the g-inverses A  which are determined by the
subspaces  ( A  A) and  ( AA  ) , respectively.
 ( AA )  (A  A) ( A  A )
 ( AA  )
in  n in  m
Example (geodetic networks):

In a geodetic network, the projections A  A correspond to a S –
transformations in the sense of W. Baarda (1973).
Example ( A  and A m g-inverses):

The projections AA   A( A A)  A  guarantee that the subspaces
 ( AA  ) and  ( AA  ) are orthogonal to each other. The same
holds for the subspaces  ( A m A) and  ( A m A) of the
projections A m A  A ( AA )  A.
In general, there exist more than one g-inverses which lead to identical
projections AA  and A  A . For instance, following A. Ben – Israel, T. N. E.
Greville (1974, p.59) we learn that the reflexive g-inverse which follows from
A r  ( A  A) A  ( AA  )  A  AA 
contains the class of all reflexive g-inverses. Therefore it is obvious that the
reflexive g-inverses A r contain exact by one pair of projections AA  and A  A
and conversely. In the special case of a symmetric matrix A , A  A  , and n  m
we know due to
 ( AA  )   ( A )   ( A )   ( A )   ( A  A )
that the column spaces  ( AA  ) are orthogonal to the null space  ( A  A)
illustrated by the sign ”  ”. If these complementary subspaces  ( A  A) and
 ( AA  ) are orthogonal to each other, the postulate of a symmetric reflexive g-
inverse agrees to
A rs : ( A  A) A  ( A  A)  A  A( A  ) ,
if A  is a suited g-inverse.
There is no insurance that the complementary subspaces  ( A  A ) and  ( A  A )

and  ( AA  ) and  ( AA  ) are orthogonal. If such a result should be reached,
we should use
the uniquely defined pseudoinverse A  ,
also called Moore-Penrose inverse
for which holds
 ( A  A )   ( A  A ),  ( AA  )   ( AA  )
or equivalent
AA  ( AA  ), A  A  ( A  A ).

If we depart from an arbitrary g-inverse ( AA  A)  , the pseudoinverse A  can be

build on
A  : A ( AAA)  A (Zlobec formula)
or
A : A( AA) A( A A) A  (Bjerhammar formula) ,
  
if both the g-inverses ( AA)  and ( A A)  exist. The Moore-Penrose inverse
fulfils the Penrose equations:
(i) AA  A  A (g-inverse)
(ii) A  AA   A  (reflexivity)
(iii) AA   ( AA  )
 Symmetry due to orthogonal projection .
(iv) A  A  ( A  A) 
Lemma (Penrose equations)

Let A be a rectangular matrix A of the order O( A) be given. A g-
generalized matrix inverse which is rank preserving rk( A)  rk( A  )
fulfils the axioms of the Penrose equations (i) - (iv).
For the special case of a symmetric matrix A also the pseudoinverse A  is

symmetric, fulfilling
 ( A  A)   ( AA  )   ( AA  )   ( A  A) ,
in addition
A  A( A ) A  A( A 2 )  A( A 2 )  A.
 2 
Various formulas of computing certain g-inverses, for instance by the method of

rank factorization, exist. Let A be an n  m matrix A of rank r : rkA such that
A  GF, O(G )  n  r , O(F)  r  m .
Due to the inequality r  rk G   min{r , n}  r only G posesses reflexive g-
inverses G r , because of
I r  r  [(G G ) 1 G ]G  [(G G ) 1 G ](GG r G )  G r G
represented by left inverses in the sense of G L G  I. In a similar way, all g-
inverses of F are reflexive and right inverses subject to Fr : F (FF ) 1 .
The whole class of reflexive g-inverses of A can be represented by
A r : Fr G r  Fr G L .
In this case we also find the pseudoinverse, namely
A  : F (FF ) 1 (G G ) 1 G 
because of
 ( A  A)   (F )   (F)   ( A  A)   ( A)
 ( AA  )   (G )   (G )   ( AA  )   ( A ).
If we want to give up the orthogonality conditions, in case of a quadratic matrix
A  GF , we could take advantage of the projections
A r A  AA r
we could postulate
 ( A p A)   ( AA r )   (G ) ,
 ( A A r )   ( A r A)   (F ) .
In consequence, if FG is a nonsingular matrix, we enjoy the representation
A r : G (FG ) 1 F ,
which reduces in case that A is a symmetric matrix to the pseudoinverse A  .
Dual methods of computing g-inverses A  are based on the basis of the null
space, both for F and G, or for A and A . On the first side we need the matrix
EF by
FEF  0, rkEF  m  r versus G EG   0, rkEG   n  r
on the other side. The enlarged matrix of the order (n  r  r )  (n  m  r ) is
automatically nonsingular and has the Cayley inverse
1
A EG    A EF 
E    
 F 0   EG  0
with the pseudoinverse A  on the upper left side. Details can be derived from A.
Ben – Israel and T. N. E. Greville (1974 p. 228).
If the null spaces are always normalized in the sense of
 EF | EF  I m  r ,  EG  | EG   I n  r
because of
E  EF  EF | EF  1  EF

F
and
EG   EG  | EG   1 EG   EG 
1
A EG   A EG  
E    .
 F 0   EF 0 
These formulas gain a special structure if the matrix A is symmetric to the order
O( A) . In this case
EG   EF : E , O(E)  (m  r )  m , rk E  m  r
and
1
 A E   A E   E | E   1 
E 0    
  E | E  E
1
  0 
on the basis of such a relation, namely EA   0 there follows
I m  AA   E  E | E  1 E 
 ( A  EE)[ A   E(EEEE) 1 E]
and with the projection (S - transformation)
A  A  I m  E  E | E  1 E  ( A  EE) 1 A
and
A   ( A  EE) 1  E(EEEE) 1 E
pseudoinverse of A
 ( A A)   ( AA  )   ( A)   ( A)   (E) .

In case of a symmetric, reflexive g-inverse A rs there holds the orthogonality or

complementary
 ( A rs A )   ( AA rs )
 ( AA rs ) complementary to  ( AA rs ) ,

which is guaranteed by a matrix K , rk K  m  r , O(K )  (m  r )  m such that
KE is a non-singular matrix.
At the same time, we take advantage of the bordering of the matrix A by K and
K  , by a non-singular matrix of the order (2m  r )  (2m  r ) .
1
 A K   A rs K R 
K 0      .
   (K R ) 0 
K R : E(KE) 1 is the right inverse of A . Obviously, we gain the symmetric

reflexive g-inverse A rs whose columns are orthogonal to K  :
R( A rs A )  R(K )   ( AA rs )
KA rs  0 
 I m  AA  K (EK ) 1 E 

rs
 ( A  K K )[ A rs  E(EK EK ) 1 E]

and projection (S - transformation)
A A  I m  E(KE) 1 K  ( A  K K ) 1 A  ,

rs
A rs  ( A  K K ) 1  E(EK EK ) 1 E .

symmetric reflexive g-inverse
For the special case of a symmetric and positive semidefinite m  m matrix A the
matrix set U and V are reduced to one. Based on the various matrix decompo-
sitions
L 0   U1 
A   U1 , U 2       U1 AU1 ,
 0 0   U 2 
we find the different g - inverses listed as following.
L1 L12   U1 
A   U1 , U 2     .
 L 21 L 21LL12   U 2 
Lemma (g-inverses of symmetric and positive semidefinite

matrices):
L1 L12   U1 
(i) A    U1 , U 2     ,
 L 21 L 22   U2 
(ii) reflexive g-inverse
L1 L12   U1 
A r   U1 , U 2     
 L 21 L 21LL12   U 2 
(iii) reflexive and symmetric g-inverse
L1 L12   U1 
A rs   U1 , U 2    
 L12 L12 LL12   U2 
(iv) pseudoinverse
L1 0   U1 
A    U1 , U 2   1
    U1L U1 .
 0 0   U 2 
We look at a representation of the Moore-Penrose inverse in terms of U 2 , the

basis of the null space  ( A  A ) . In these terms we find
1
  A U2  ,

E : U1  A U2 
 U2 0   U2 0 
by means of the fundamental relation of A  A

A  A  lim( A   I m ) 1 A  AA   I m  U 2 U 2  U1U1 ,
 0
we generate the fundamental relation of the pseudo inverse
A   ( A  U 2 U 2 ) 1  U 2 U 2 .
The main target of our discussion of various g-inverses is the easy handling of
representations of solutions of arbitrary linear equations and their characteriza-
tions.
We depart from the solution of a consistent system of linear equations,
Ax  c, O ( A )  n  m, c   ( A)  x  A  c for any g-inverse A  .
x  A  c is the general solution of such a linear system of equations. If we want
to generate a special g - inverse, we can represent the general solution by
x  A  c  (I m  A  A ) z for all z   m ,
since the subspaces  ( A) and  (I m  A  A ) are identical. We test the consis-
tency of our system by means of the identity
AA  c  c .
c is mapped by the projection AA  to itself.
Similary we solve the matrix equation AXB  C by the consistency test: the
existence of the solution is granted by the identity
AA  CB  B  C for any g-inverse A  and B  .

If this condition is fulfilled, we are able to generate the general solution by
X  A  CB  Z  A  AZBB  ,
where Z is an arbitrary matrix of suitable order. We can use an arbitrary g-
inverse A  and B  , for instance the pseudoinverse A  and B  which would be
for Z  0 coincide with two-sided orthogonal projections.
How can we reduce the matrix equation AXB  C to a vector equation?
The vec-operator is the door opener.
AXB  C  (B  A) vec X  vec C .
The general solution of our matrix equation reads
vec X  (B  A)  vec C  [I  (B   A)  (B  A)] vec Z .
Here we can use the identity
( A  B)   B   A  ,
generated by two g-inverses of the Kronecker-Zehfuss product.
At this end we solve the more general equation Ax  By of consistent type
 ( A)   (B) by
Lemma (consistent system of homogenous equations Ax  By ):
Given the homogenous system of linear equations Ax  By for
y    constraint by By   ( A) . Then the solution x  Ly can
be given under the condition
 ( A )   (B ) .
In this case the matrix L may be decomposed by
L  A  B for a certain g-inverse A  .
Appendix B: Matrix Analysis
A short version on matrix analysis is presented. Arbitrary derivations of scalar-
valued, vector-valued and matrix-valued vector – and matrix functions for func-
tionally independent variables are defined. Extensions for differenting symmetric
and antisymmetric matrices are given. Special examples for functionally depend-
ent matrix variables are reviewed.
B1 Derivatives of Scalar valued and Vector valued Vector Functions
Here we present the analysis of differentiating scalar-valued and vector-valued
vector functions enriched by examples.
Definition: (derivative of scalar valued vector function):
Let a scalar valued function f (x) of a vector x of the order
O(x)  1 m (row vector) be given, then we call
f
Df (x)  [D1 f (x), , Dm f (x)] :
x
first derivative of f (x) with respect to x .
Vector differentiation is based on the following definition.

Definition: (derivative of a matrix valued matrix function):
Let a n  q matrix-valued function F(X) of a m  p matrix of
functional independent variables X be given. Then the nq  mp
Jacobi matrix of first derivates of F is defined by
vecF(X)
J F = DF(X) : .
 (vecX)
The definition of first derivatives of matrix-functions can be motivated as fol-

lowing. The matrices F  [ f ij ]   n q and X  [ xk  ]   m p are based on two-
dimensional arrays. In contrast, the array of first derivatives
 f ij 
   J ijk    
n q m p

x
 k 
is four-dimensional and automatic outside the usual frame of matrix algebra of
two-dimensional arrays. By means of the operations vecF and vecX we will
vectorize the matrices F and X. Accordingly we will take advantage of vecF(X)
of the vector vecX derived with respect to the matrix J F , a two-dimensional
array.
B2 Derivatives of Trace Forms 523
Examples
(i) f (x)  xAx  a11 x12  (a12  a21 ) x1 x2  a22 x22
f
Df (x)  [D1 f (x), D2 f (x)]  
x
 [2a11 x1  (a12  a21 ) x2 | (a12  a21 ) x1  2a22 x2 ]  x( A  A)
 a11 x1  a12 x2 
(ii) f ( x)  Ax   
 a21 x1  a22 x2 
f  a11 a12 
J F  Df (x)   A
x  a21 a22 
 x2  x x x11 x12  x12 x22 

(iii) F(X)  X 2   11 12 21 
 x21 x11  x22 x21 x21 x12  x222

 x112  x12 x21 

 
 x21 x11  x22 x21 
vecF(X) 
x x  x x 
 11 12 12 2 22 
 x21 x12  x22 
(vecX)  [ x11 , x21 , x12 , x22 ]
 2 x11 x12 x21 0 

 x11  x22 x21 
vecF(X)  x21 0
J F  DF(X)  
 (vecX)  x12 0 x11  x22 x12 
 
 0 x12 x21 2 x22 
O(J F )  4  4 .
B2 Derivatives of Trace Forms

Up to now we have assumed that the vector x or the matrix X are functionally
idempotent. For instance, the matrix X cannot be a symmetric matrix
X  [ xij ]  [ x ji ]  X or an antisymmetric matrix X  [ xij ]  [ x ji ]   X . In
case of a functional dependent variables, for instance xij  x ji or xij   x ji we
can take advantage of the chain rule in order to derive the differential procedure.
 A, if X consists of functional independent elements;


tr( AX)   A  A - Diag[a11 , , ann ], if the n  n matrix X is symmetric;
X
 A  A, if the n  n matrix X is antisymmetric.
B2 Derivations of Trace Forms 524
[vecA ], if X consists of functional independent elements;

[vec(A   A - Diag[a , , a ])], if the n  n matrix X is

tr( AX)   11 nn
 (vecX)  symmetric;

[vec(A   A )], if the n  n matrix X is antisymmetric.
for instance
a a12  x x12 
A   11 , X   11
a22  x22 
.
 a21  x21
Case # 1: “the matrix X consists of functional independent elements”
   
 x x12  a a21 
 
 , tr( AX)   11  A .
11
X     X  a12 a22 
 x x22 
 21
Case # 2: “the n  n matrix X is symmetric : X  X “
x12  x21 
tr( AX )  a11 x11  ( a12  a21 ) x21  a22 x22
  dx21      
  x dx12 x21   x11 x21 
=  11  
X        
 x x22   x21 x22 
 21
  a11 a12  a21 
tr( AX)    A   A  Diag(a11 , , ann ) .
X  a12  a21 a22 
Case # 3: “the n  n matrix X is antisymmetric : X   X  ”

x11  x22  0, x12   x21  tr( AX)  (a12  a21 ) x21
  dx21      
 x 
 dx12 x21   x11 x21 
=  11  
X        
  
 x21 x22   x21 x22 
  0 a12  a21 
tr( AX)     A  A .
X  a12  a21 0 
Let us now assume that the matrix X of variables xij is always consisting of
functionally independent elements. We note some useful identities of first de-
rivatives.
Scalar valued functions of vectors

(ax)  a (B1)
 x

(xAx)  X( A  A ). (B2)
 x
Scalar-valued function of a matrix: trace

 tr(AX)
 A ; (B3)
X
especially:
 aXb  tr(baX)
  b  a ;
 (vecX)  (vecX)

tr(XAX)  ( A  A) X ; (B4)
X
especially:
 tr(XX)
 2(vecX) .
 (vecX)

tr(XAX)  XA  A X , (B5)
X
especially:
 trX 2
 2(vecX) .
 (vecX)

tr(AX 1 )  ( X 1AX 1 ), if X is nonsingular, (B6)
X
especially:
1
 tr(X )
 [vec(X 2 )] ;
 (vecX)
 aX 1b  tr(baX 1 )

  b( X 1 )  aX 1 .
 (vecX)   (vecX) 

trX   ( X) 1 , if X is quadratic ; (B7)
X
especially:
 trX
 (vecI ) .
 (vecX)
B3 Derivatives of Determinantal Forms
The scalarvalued forms of matrix determinantal form will be listed now.


| AXB | A (adjAXB)B | AXB | A (BXA) 1 B,
X
if AXB is nonsingular ; (B8)
especially:
axb
 b  a, where adj(aXb)=1 .
 (vecX)

| AXBXC | C(adjAXBXC) AXB  A (adjAXBXC)CXB ; (B9)
X
especially:

| XBX | (adjXBX)XB  (adjXB X) XB ;
X
 | XSX |
 2(vecX)(S  adjXSX), if S is symmetric;
 (vecX)
 | XX |
 2(vecX)(I  adjXX) .
 (vecX)

| AXBXC | BXC(adjAXBXC) A  BXA (adjAXBXC)C ; (B10)
X
especially:

| XBX | BX(adjXBX)  BX(adjXBX) ;
X
 | XSX |
 2(vecX)(adjXSX  S), if S is symmetric;
 (vecX)
 | XX |
 2(vecX)(adjXX  I ) .
 (vecX)
B4 Derivatives of a Vector/Matrix Function of a Vector/Matrix 527

| AXBXC | BXA (adjAXBXC)C  A (adjAXBXC)CXB ; (B11)
X

| XBX | BX(adjXBX)  (adjXBX)XB ;
X
especially:
|X | 2
 (vec[Xadj(X 2 )  adj(X 2 )X]) 
 (vecX)
| X |2 (vec[X (X ) 2  (X ) 2 X ])  2 | X |2 [vec(X 1 )], if X is non-singular .


| X |  | X | ( X 1 ),    if X is non-singular , (B12)
X
|X|
| X | (X 1 ) if X is non-singular;
X
especially:
|X|
 [vec(adjX)].
 (vecX)
B4 Derivatives of a Vector/Matrix Function of a Vector/Matrix
If we differentiate the vector or matrix valued function of a vector or matrix, we
will find the results of type (B13) – (B20).
vector-valued function of a vector or a matrix


AX  A (B13)
x
  (a  A )vecX
AXa   a  A (B14)
 (vecX)  (vecX)
matrix valued function of a matrix

 (vecX)
 I mp for all X   m p (B15)
 (vecX)
 (vecX)
 K m p for all X   m p (B16)
 (vecX) 
where K m p is a suitable commutation matrix
 (vecXX )
 (I m2 +K mm )(X  I m ) for all X   m  p ,
 (vecX ) 
where the matrix I m2 +K mm is symmetric and idempotent,
B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix
Functions 528
 (vecXX)
 (I p2 +K p  p )(I p  X) for all X   m p
 (vecX)
 (vecX 1 )
 ( X 1 ) if X is non-singular
 (vecX)
 (vecX ) 
  (X) -j  X j 1 for all    , if X is a square matrix.
 (vecX) j 1
B5 Derivatives of the Kronecker – Zehfuss product

Act a matrix-valued function of two matrices X and Y as variables be given. In
particular, we assume the function
F(X, Y)  X  Y for all X   m p , Y   n q
as the Kronecker – Zehfuss product of variables X and Y well defined. Then the
identities of the first differential and the first derivative follow:
dF(X, Y)  (dX)  Y  X  dY,

dvecF(X, Y)  vec( dX  Y)  vec(X  dY),
vec( dX  Y)  (I p  K qm  I n )  (vecdX  vecY) 

 (I p  K qm  I n )  (I mp  vecY )  d (vecX) 
 (I p  [K qm  I n )  (I m  vecY )])  d (vecX),
vec(X  dY)  (I p  K qm  I n )  (vecX  vecdY) 

 (I p  K qm  I n )  (vecX  I nq )  d (vecY ) 
 ([(I p  K qm )  (vecX  I q )]  I n )  d (vecY ),
 vec(X  Y)
 I p  [(K qm  I  n)  (I m  vecY)] ,
 (vecX)
 vec(X  Y )
 (I p  K qm )  (vecX  I q )]  I n .
 (vecY )
B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Ma-

trix Functions
Many matrix functions f ( X) or F(X) force us to pay attention to dependencies
within the variables. As examples we treat here first derivatives of symmetric or
antisymmetric matrix functions of X.
B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions 529
Definition: (derivative of a matrix-valued symmetric matrix

function):
Let F(X) be an n  q matrix-valued function of an m  m
symmetric matrix X = X  . The nq  m( m  1) / 2 Jacobi matrix of
first derivates of F is defined by
vecF(X)
J Fs = DF(X  X ) : .
 (vechX )
Definition: (derivative of matrix valued antisymmetric matrix

function):
Let F(X) be an n  q matrix-valued function of an m  m anti-
symmetric matrix X =  X  . The nq  m( m  1) / 2 Jacobi matrix
of first derivates of F is defined by
vecF(X)
J aF = DF(X   X ) : .
 (veckX )
Examples
(i) Given is a scalar-valued matrixfunction tr(AX ) of a symmet-
ric variable matrix X = X  , for instance
 x11 
A   11
a a12 
, X   11
x x12 
, vechX  x 
 21
a a22   21
x x22   22 
 x33 
tr(AX )  a11 x11  (a12  a 21 )x 21  a22 x22
   
[ , , ]
 (vechX ) x11 x21 x22
 tr(AX)
 [a11 , a12  a21 , a22 ]
 (vechX)
 tr(AX)  tr(AX)
 [vech(A   A  Diag[a11 , , ann ])]=[vech ].
 (vechX) X
(ii) Given is scalar-valued matrix function tr(AX) of an anti-

symmetric variable matrix X   X , for instance
 x21 
A   11
a12 
, X  
a 0
, veckX  x21 ,
 a21 a22   x21 0 
tr(AX)  (a12  a 21 )x 21
B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions 530
   tr(AX)
 ,  a12  a21 ,
 (veckX) x21  (veckX)
 tr(AX)  tr(AX)
 [veck(A   A )]=[veck ] .
 (veckX)  X
B7 Higher order derivatives

Up to now we computed only first derivatives of scalar-valued, vector-valued
and matrix-valued functions. Second derivatives is our target now which will be
needed for the classification of optimization problems of type minimum or
maximum.
Definition: (second derivatives of a scalar valued vector function):
Let f (x) a scalar-valued function of the n  1 vector x . Then
the m  m matrix
2 f
DDf (x)  D(Df (x)) :
xx
denotes the second derivatives of f ( x ) to x and x . Correspond-
ingly
 
D2 f (x) :  f (x)  (vecDD) f (x)
x x
denotes the 1  m 2 vector of second derivatives.
and
Definition: (second derivative of a vector valued vector function):
Let f (x) be an n  1 vector-valued function of the m  1 vector x .
Then the n  m 2 matrix of second derivatives
   2 f ( x)
H f  D2 f (x)  D(Df (x)) :  f ( x) 
x x xx
is the Hesse matrix of the function f (x) .
and
Definition: (second derivatives of a matrix valued matrix function):
Let F(X) be an n  q matrix valued function of an m  p matrix
of functional independent variables X . The nq  m 2 p 2 Hesse
matrix of second derivatives of F is defined by
   2 vecF(X)
H F  D2 F(X)  D(DF(X)):  vecF(X)  .
 (vecX)  (vecX)  (vecX) (vecX)
B7 Higher order derivatives 531
The definition of second derivatives of matrix functions can be motivated as
follows. The matrices F  [ fij ]   n q and X  [ xk  ]   m p are the elements of a
two-dimensional array. In contrast, the array of second derivatives
 2 fij
[ ]  [kijk pq ]   n q  m p  m p
xk  x pq
is six-dimensional and beyond the common matrix algebra of two-dimensional
arrays. The following operations map a six-dimensional array of second deriva-
tives to a two-dimensional array.
(i) vecF(X) is the vectorized form of the matrix valued function
(ii) vecX is the vectorized form of the variable matrix
 
(iii) the Kronecker – Zehfuss product 
 (vecX )  (vecX )
vectorizes the matrix of second derivatives
(iv) the formal product of the 1  m 2 p 2 row vector of second
derivatives with the nq 1 column vector vecF(X) leads to
an nq  m 2 p 2 Hesse matrix of second derivatives.
Again we assume the vector of variables x and the matrix of variables X consists
of functional independent elements. If this is not the case we according to the
chain rule must apply an alternative differential calculus similary to the first
deri-vative, case studies of symmetric and antisymmetric variable matrices.
Examples:
(i) f (x)  xAx  a11 x12  (a12  a21 ) x1 x2  a22 x22
f
Df (x)   [2a11 x1  (a12  a21 ) x2 | (a12  a21 ) x1  2a22 x2 ]
x
2 f  2a11 a12  a21 

D2 f (x)  D(Df (x))    A  A
xx  a12  a21 2a22 
a x  a x 
(ii) f (x)  Ax   11 1 12 2 
 a21 x1  a22 x2 
f  a11 a12 
Df (x)   A
x  a21 a22 
 2f 0 0 
DDf (x)   , O(DDf (x))  2  2
xx  0 0 
D2 f (x)  [0 0 0 0], O(D2 f (x))  1 4
B7 Higher order derivatives 532
 x2  x x x11 x12  x12 x22 

(iii) F(X)  X 2   11 12 21 
 x21 x11  x22 x21 x21 x12  x222

 x112  x12 x21 

 
x21 x11  x22 x21 
vecF(X)   , O (F )  O ( X )  2  2
 x11 x12  x12 x22 
 
 x21 x12  x222

(vecX)  [ x11 , x21 , x12 , x22 ]
 2 x11 x12 x21 0 

 x11  x22 x21 
 vecF(X)  x21 0
JF  
 (vecX)  x12 0 x11  x22 x12 
 
 0 x12 x21 2 x22 
O(J F )  4  4
     
HF   vecF(X)  [ , , , ]  JF 

 (vecX)  (vecX)  x11 x21 x12 x22
2 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
 
0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0

0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0
 
 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 2 
O(H F )  4  16 .
At the end, we want to define the derivative of order l of a matrix-valued matrix
function whose structure is derived from the postulate of a suitable array.
Definition ( l-th derivative of a matrix-valued matrix function):
Let F(X) be an n  q matrix valued function of an m  p matrix
of functional independent variables X. The nq  ml p l matrix of
l-th derivative is defined by
 
Dl F(X) :   vecF(X) 
 (vecX) l -times  (vecX)
l
= vecF(X) for all l   .
 (vecX)  (vecX)
l -times
Appendix C: Lagrange Multipliers
?How can we find extrema with side conditions?
We generate solutions of such external problems first on the basis of algebraic

manipulations, namely by the lemma of implicit functions, and secondly by a
geometric tool box, by means of interpreting a risk function and side conditions
as level surfaces (specific normal images, Lagrange multipliers).
C1 A first way to solve the problem

A first way to find extreme with side conditions will be based on a risk function
f ( x1 ,..., xm )  extr (C1)
with unknowns ( x1 ,..., xm )   m , which are restricted by side conditions of type
 F1 ( x1 ,..., xm ), F2 ( x1 ,..., xm ),..., Fr ( x1 ,..., xm )  0 (C2)
Fi
rk( )  r  m. (C3)
xm
The side conditions Fi ( x j ) (i  1,..., r , j  1,..., m) are reduced by the lemma of

the implicit function: solve for
xm  r 1  G1 ( x1 ,..., xm  r )
xm  r  2  G2 ( x1 ,..., xm  r )
... (C4)
xm 1  Gr 1 ( x1 ,..., xm  r )
xm  Gr ( x1 ,..., xm  r )
and replace the result within the risk function

f ( x1 , x2 ,..., xm  r , G1 ( x1 ,..., xm  r ),..., Gr ( x1 ,..., xm 1 ))  extr . (C5)
The “free” unknowns ( x1 , x2 ,..., xm  r 1 , xm  r )   m  r can be found by taking the

result of the implicit function theorem as follows.
Lemma C1 (“implicit function theorem”):

Let Ω be an open set of  m   m  r   r and F : Ω   r with vectors
x1   m  r and x 2   m  r . The maps
C1 A first way to solve the problem 534
 F1 ( x1 ,..., xm  r ; xm  r 1 ,..., xm ) 
 F2 ( x1 ,..., xm  r ; xm  r 1 ,..., xm ) 
 
(x1 , x 2 )  F (x1 , x 2 )   ...  (C6)
 Fr 1 ( x1 ,..., xm  r ; xm  r 1 ,..., xm ) 
 Fr ( x1 ,..., xm  r ; xm  r 1 ,..., xm ) 
transform a continuously differential function with F(x1 , x 2 )  0 . In case
of a Jacobi determinant j not zero or a Jacobi matrix J of rank r, or
 ( F1 ,..., Fr )
j : det J  0 or rk J  r , J : , (C7)
 ( xm  r 1 ,..., xm )
there exists a surrounding U : U (x1 )   m  r and V : U (x 2 )   r
such that the equation F(x1 , x 2 )  0 for any x1  U in V has only one
solution
 xm  r 1   G1 ( x1 ,..., xm  r ) 
 xm  r  2   G 2 ( x1 ,..., xm  r ) 
   
x 2  G (x1 ) or  ...    ... . (C8)
 m 1   r 1 1
x G ( x ,..., xmr  )
 xm   G r ( x1 ,..., xm  r ) 
The function G : U  V is continuously differentiable.

A sample reference is any literature treating analysis, e.g. C. Blotter .
Lemma C1 is based on the Implicit Function Theorem whose result we insert
within the risk function (C1) in order to gain (C5) in the free variables ( x1 ,
..., xm  r )   m  r . Our example C1 explains the solution technique for finding
extreme with side conditions within our first approach. Lemma C1 illustrates that
there exists a local inverse of the side conditions towards r unknowns ( xm  r 1 ,
xm  r  2 ,..., xm 1 , xm )   r which in the case of nonlinear side conditions towards
r unknowns ( xm  r 1 , xm  r  2 ,..., xm 1 , xm )   r which in case of nonlinear side
conditions is not necessary unique.
:Example C1:
Search for the global extremum of the function

f ( x1 , x2 , x3 )  f ( x, y , z )  x  y  z
subject to the side conditions

 F1 ( x1 , x2 , x3 )  Z ( x, y, z ) : x 2  2 y 2  1  0 (elliptic cylinder)
 F ( x , x , x )  E ( x, y, z ) : 3x  4 z  0
 2 1 2 3 (plane)
Fi 2x 4 y 0 
J( ) , rk J ( x  0 oder y  0)  r  2
x j  3 0 4 
 1
1y   2 2 1  x
2
F1 ( x1 , x2 , x3 )  Z ( x, y, z )  0  
2 y   1 2 1  x2
 2
3
F2 ( x1 , x2 , x3 )  E ( x, y, z )  0  z  x
4
1 3
1 f ( x1 , x2 , x3 )  1 f ( x, y , z )  f ( x,  2 1  x2 , ) 
2 4
x 1
  2 1 x 2
4 2
1 3
2 f ( x1 , x2 , x3 )  2 f ( x, y , z )  f ( x,  2 1  x2 , )
2 4
x 1
  2 1  x2
4 2
1 1 x 1

1 f ( x)  0   2  0 1 x  
4 2 1 x 2 3
1 1 x 1
2 f ( x )  0 
 2  0 2 x  
4 2 1 x 2 3
1 3 1 3
1 f ( )   (minimum), 2 f ( )   (maximum).
3 4 3 4
At the position x  1/ 3, y  2 / 3, z  1/ 4 we find a global minimum, but at
the position x  1/ 3, y  2 / 3, z  1/ 4 a global maximum.
An alternative path to find extreme with side conditions is based on the

geometric interpretation of risk function and side conditions. First, we form the
conditions
F1 ( x1 , , xm )  0 
F2 ( x1 , , xm )  0  Fi
 rk( )r
  xj
Fr ( x1 , , xm )  0 
by continuously differentiable real functions on an open set Ω   m . Then we

define r equations Fi ( x1 ,..., xm )  0 for all i  1,..., r with the rank conditions
rk(Fi / x j )  r , geometrically an (m-1) dimensional surface  F  Ω which
can be seen as a level surface. See as an example our Example C1 which describe
as side conditions
F1 ( x1 , x2 , x3 )  Z ( x, y, z )  x 2  2 y 2  1  0
F2 ( x1 , x2 , x3 )  E ( x, y , z )  3x  4 z  0
representing an elliptical cylinder and a plane. In this case is the (m-r) dimen-
sional surface  F the intersection manifold of the elliptic cylinder and of the
plane as the m-r =1 dimensional manifold in  3 , namely as “spatial curve”.
Secondly, the risk function f ( x1 ,..., xm )  extr generates an (m-1) dimensional
surface  f which is a special level surface. The level parameter of the (m-1)
dimensional surface  f should be external. In our Example C1 one risk function
can be interpreted as the plane
f ( x1 , x2 , x3 )  f ( x, y, z )  x  y  z .
We summarize our result within Lemma C2.
Lemma C2 (extrema with side conditions)

The side conditions Fi ( x1 , , xm )  0 for all i  {1, , r} are built on
continuously differentiable functions on an open set Ω   m which
are subject to the side conditions rk(Fi / x j )  r generating an
(m-r) dimensional level surface  f . The function f ( x1 , , xm ) pro-
duces certain constants, namely an (m-1) dimensional level surface
 f . f ( x1 , , xm ) is geometrically as a point p   F conditionally
extremal (stationary) if and only if the (m-1) dimensional level sur-
face  f is in contact to the (m-r) dimensional level surface in p.
That is there exist numbers 1 , , r , the Lagrange multipliers, by
grad f ( p )   i 1 i grad Fi ( p ).
r
The unnormalized surface normal vector grad f ( p ) of the (m-1)

dimensional level surface  f in the normal space  F of the level
surface  F is in the unnormalized surface normal vector
grad Fi ( p ) in the point p . To this equation belongs the variational
problem
 ( x1 , , xm ; 1 , , r ) 
f ( x1 , , xm )   i 1 i Fi ( x1 , , xm )  extr .
r
:proof:
First, the side conditions Fi ( x j )  0, rk(Fi / x j )  r for all i  1, , r ; j  1, , m

generate an (m-r) dimensional level surface  F whose normal vectors
ni ( p) : grad Fi ( p)   p  F (i  1, , r )
span the r dimensional normal space  of the level surface  F  Ω . The
r dimensional normal space  p  F of the (m-r) dimensional level surface  F
is orthogonal complement p  p to the tangent space p  F   m 1 of  F in

the point p spanned by the m-r dimensional tangent vectors
x
t k ( p ) :  p  F (k  1,..., m  r ).
x k x p
:Example C2:
Let the m  r  2 dimensional level surface  F of the sphere  r2   3 of radius
r (“level parameter r 2 ”) be given by the side condition
F ( x1 , x2 , x3 )  x12  x2 2  x32  r 2  0.
:Normal space:
 2 x1 
F F F
n( p )  grad F ( p )  e1  e2  e3  [e1 , e 2 , e 3 ]  2 x2  .
x1 x2 x3 2 x 
 3p
The orthogonal vectors [e1 , e 2 , e 3 ] span  . The normal space will be generated
3
locally by a normal vector n( p ) = grad F ( p ).

:Tangent space:
The implicit representation is the characteristic element of the level surface. In
order to gain an explicit representation, we take advantage of the Implicit
Function Theorem according to the following equations.
F ( x1 , x2 , x3 )  0 
F 
rk( )  r  1   x3  G ( x1 , x2 )
x j 
F F
x12  x2 2  x32  r  0 and ( )  [2 x1  2 x2  2 x3 ], rk( ) 1
x j x j
 x j  G ( x1 , x2 )   r 2  ( x12  x2 2 ) .
The negation root leads into another domain of the sphere: here holds the do-
main 0  x1  r , 0  x2  r , r 2  ( x1  x2 )  0.
2 2
The spherical position vector x( p ) allows the representation
x( p )  e1 x1  e 2 x2  e 3 r 2  ( x12  x2 2 ) ,
which is the basis to produce
  
 x x1  1 
 t1 ( p )  ( p )  e1  e3  [e1 , e 2 , e3 ]  0 
 x2 r  ( x1  x2 )
2 2 2
 x1 
  
r 2  ( x12  x22 ) 

  
 0 
 x x2
t1 ( p )  x ( p)  e 2  e3 2  [e1 , e 2 , e3 ]  1 ,
 2 r  ( x12  x2 2 )  x2 
  r 2  ( x12  x2 2 ) 

which span the tangent space p  F   2 at the point p.

:The general case:
In the general case of an ( m  r ) dimensional level surface  F , implicitly
produced by r side conditions of type
F1 ( x1 ,..., xm )  0 
F2 ( x1 ,..., xm )  0 
 Fi
...  rk ( x )  r ,
Fr  j ( x1 ,..., xm )  0  j
Fr ( x1 ,..., xm )  0  
the explicit surface representation, produced by the Implicit Function Theorem,

reads
x( p )  e1 x1  e 2 x2  ...  e m  r xm  r  e m  r 1G1 ( x1 ,..., xm  r )  ...  e mGr ( x1 ,..., xm  r ).
The orthogonal vectors [e1 ,..., e m ] span  m .

Secondly, the at least once conditional differentiable risk function
f ( x1 ,..., xm ) for special constants describes an ( m  1) dimensional level surface
 F whose normal vector
n f : grad f ( p )   p  f
spans an one-dimensional normal space  p  f of the level surface  f  Ω in
the point p . The level parameter of the level surface is chosen in the extremal
case that it touches the level surface  f the other level surface  F in the
point p . That means that the normal vector n f ( p ) in the point p is an element of
the normal space  p  f . Or we may say the normal vector grad f ( p ) is a
linear combination of the normal vectors grad Fi ( p ) in the point p,
grad f ( p )   i 1 i grad Fi ( p ) for all i  1,..., r ,

r
where the Lagrange multipliers i are the coordinates of the vector grad f ( p ) in
the basis grad Fi ( p ).
:Example C3:
Let us assume that there will be given the point X   3 . Unknown is the point in
the m  r  2 dimensional level surface  F of type sphere  r 2   3 which is
from the point X   3 at extremal distance, either minimal or maximal.
The distance function || X  x ||2 for X   3 and X   r 2 describes the risk func-
tion
f ( x1 , x2 , x3 )  ( X 1  x1 ) 2  ( X 2  x2 ) 2  ( X 3  x3 ) 2   2  extr ,
x1 , x2 , x3
which represents an m  1  2 dimensional level surface  f of type sphere

 r 2   3 at the origin ( X 1 , X 2 , X 3 ) and level parameter R 2 . The conditional
extremal problem is solved if the sphere  R 2 touches the other sphere  r 2 . This
result is expressed in the language of the normal vector.
f f f
n( p ) : grad f ( p )  e1  e2  e3 
x1 x2 x3
 2( X 1  x1 ) 
 [e1 , e 2 , e 3 ]  2( X 2  x2 )    p  f
 2( X  x ) 
 3 3 p
 2 x1 
n( p ) : grad F ( p )  [e1 , e 2 , e 3 ]  2 x2 
2x 
 3
is an element of the normal space  p  f .
The normal equation
grad f ( p )   grad F ( p )
leads directly to three equations

xi  X 0   xi  xi (1   )  X i (i  1, 2,3) ,
which are completed by the fourth equation

F ( x1 , x2 , x3 )  x12  x2 2  x32  r 2  0.
Lateron we solve the 4 equations.
Third, we interpret the differential equations
grad f ( p )   i 1 i grad Fi ( p )
r
by the variational problem, by direct differentiation namely
 ( x1 ,..., xm ; 1 ,..., r ) 
 f ( x1 ,..., xm )   i 1 i Fi ( x1 ,..., xm ) 
r
extr
x1 ,..., xm ; 1 ,..., r
  f Fi
 x  x   i 1 i x  0 ( j  1,..., m)
r
 i j j
 
  x  Fi ( x j )  0 (i  1,..., r ).
 k
:Example C4:
We continue our third example by solving the alternative system of equations.
 ( x1 , x2 , x3 ;  )  ( X 1  x1 ) 2  ( X 2  x2 ) 2  ( X 3  x3 )
  ( x12  x2 2  x32  r 2 )  extr
x1 , x2 , x3 ; 
 
 2( X j  x j )  2 x j  0 
x j

 
  x1  x2  x3  r  0 
2 2 2 2
 
X1 X2 
x1  ; x2 
1  1   

x1  x2  x3  r 2  0 
2 2 2
X 12  X 2 2  X 32
 r 2  0  (1   ) 2 r 2  X 12  X 2 2  X 32  0 
(1   ) 2
X 12  X 2 2  X 32 1
 (1   ) 2   1  1, 2   X 12  X 2 2  X 32 
r2 r
1 r  X 12  X 2 2  X 32
1, 2  1  X 12  X 2 2  X 32 
r r
rX 1
( x1 )1, 2   ,
X 12  X 2 2  X 32
rX 2
( x2 )1, 2   ,
X  X 2 2  X 32
1
2
rX 3
( x3 )1, 2   .
X  X 2 2  X 32
1
2
The matrix of second derivatives H decides upon whether at the point

( x1 , x2 , x3 ,  )1, 2 we enjoy a maximum or minimum.
2
H( )  ( jk (1   ))  (1   )I 3
x j xk
H (1    0)  0 (minimum)   H (1    0)  0 (maximum)
( x1 , x2 , x3 ) is the point of minimum   ( x , x , x ) is the point of maximum .
 1 2 3
Our example illustrates how we can find the global optimum under side condi-
tions by means of the technique of Lagrange multipliers.
:Example C5:
Search for the global extremum of the function f ( x1 , x2 , x3 ) subject to two side
conditions F1 ( x1 , x2 , x3 ) and F2 ( x1 , x2 , x3 ) , namely
f ( x1 , x2 , x3 )  f ( x, y, z )  x  y  z (plane)
 F1 ( x1 , x2 , x3 )  Z ( x, y, z ) : x 2  2 y 2  1  0 (elliptic cylinder)
 F ( x , x , x )  E ( x, y, z ) : 3x  4 z  0
 2 1 2 3 (plane)
Fi 2x 4 y 0 
J( ) , rk J ( x  0 oder y  0)  r  2 .
x j  3 0 4 
:Variational Problem:
 ( x1 , x2 , x3 ; 1 , 2 )   ( x, y, z;  ,  )
 x  y  z   ( x  2 y 2  1)   (3x  4 z ) 
2
extr
x1 , x2 , x3 ;  , 
 
 1  2 x  3  0 
x


 1  4 y  0     
1
y 4y 

 1 
 1  4   0    
z 4 
 
  x2  2 y 2  1  0 
 
 
  3x  4 z  0. 
 
We multiply the first equation  / x by 4y, the second equation  / y by
(2 x) and the third equation  / z by 3 and add !
4 y  8 xy  12  y  2 x  8 xy  3 y  12  y  y  2 x  0 .
Replace in the cylinder equation (first side condition) Z(x, y, z)= x 2  2 y 2  1  0 ,

that is x1,2  1/ 3. From the second condition of the plane (second side condi-
tion) E ( x, y, z )  3 x  4 z  0 we gain z1,2  1/ 4. As a result we find x1,2 , z1,2
and finally y1,2   2 / 3.
 1,2   3 / 8 we find a maximum or minimum.
2  2 0 0
H( ) 0 4 0 
x j xk  0 0 0 

3  4 03 0  - 34 0 0 
3
H (1   )  0 2 0   0
   H (2  3 )   0 - 3 0   0
8 0 0 0  8  0 02 0 
    
(minimum)  (maximum)

( x, y, z;  ,  )1 =( 13 ,- 32 , 14 ;- 83 , 14 )   ( x, y, z;  ,  ) 2 =(- 3 , 3 ,- 4 ; 8 , 4 )
1 2 1 3 1
is the restricted minmal solution point. is the restricted maximal solution point.
The geometric interpretation of the Hesse matrix follows from E. Grafarend and
P. Lohle (1991).
( x1 , x2 , x3 ,  )1, 2 we enjoy a maximum or minimum.

A1 Matrix-Algebra Overview

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A1 Matrix-Algebra Overview

Uploaded by

Copyright:

Available Formats

Appendix A: Matrix Algebra

As a two-dimensional array we define a quadratic and rectangular matrix. First,

 a11 a12 ... a1m 1 a1m 

Beside the identity of two matrices the transpose of an m  n matrix A  [aij ] is

A matrix algebra is defined by the following operations:

Definition (matrix additions and multiplications):

(2) Addition of two matrices of the same order

A  B  A  ( 1)B (inverse addition).

(3) Multiplication of matrices

 K : G  H  [kij ], kij : gij hij , O (K )  n  m .

The transported Khatri-Rao-product generates a row product which we do

Definitions (special matrices):

symmetric  aij  a ji i, j  {1,..., n} : A  A 

antisymmetric  aij   a ji i, j  {1,..., n} : A   A 

idempotent if and only if A  A  A

Definition (orthogonal matrix) :

Definition (orthonormal matrix) :

is an algebraic representation of X  SO(2)

is called a stereographic projection of X

(v) X  SO(2) is a commutative group (“Abel”)

Facts (representation of an nn orthonormal matrix) X  SO(n) :

An nn orthonormal matrix X  SO(n) is an element of the special orthogonal

As a differentiable manifold SO(n) inherits a Riemann structure from the ambi-

is an nn orthonormal matrix.

Facts (orthonormal matrix: Helmert representation) :

respectively confirm to yourself that the n-1 vectors ( H n 1 ) are orthogonal to

Accordingly for the orthonormal vectors r1, , rn we finally find

rk  [k (k  1)]1 / 2 [1, 1,  ,1, 1  k , 0, 0,  , 0, 0]   n

The orthonormal matrix

Example (Helmert matrix of order 3):

Check that the rows are orthogonal and normalized.

Example (Helmert matrix of order 4):

Check that the rows are orthogonal and normalized.

where (n-1) terms 1/[n(n-1)] have to be summed.

Definition (orthogonal matrix) :

only appear in the first column and last row.

Example: Hankel matrix of power sums

Example: Vandermonde matrix V   33

Example: Submatrix of a Hankel matrix of power sums

where det V is the Vandermonde determinant.

Definitions (positive definite and positive semidefinite matrices)

A general definition of matrices K nm of the order nm  nm with n  m are to

Example (commutation matrix)

K 32 K 23  I 6  K 23 K 32 .

A3 Scalar Measures and Inverse Matrices

Definitions (linear independence, column and row rank):

appear, that is if 1   2  ...   n 1   n  0 holds.

We list the following important rank identities.

If a rectangular matrix of the order O( A)  n  m is fulfilled and, in addition,

Definition (non-singular matrix versus singular matrix)

Definition (division algebra):

(ii) AI  A (identity)

The non-singular matrix A 1  B is called Cayley-inverse. The conditions

Fact: ( A 1 )   ( A ) 1 : A is symmetric  A 1 is symmetric.

Facts: (Inverse Partitional Matrix /IPM/ of a symmetric matrix):

Let the symmetric matrix A be partitioned as

Then its Cayley inverse A 1 is symmetric and can be partitioned as well as

 ( A 11  A 12 A 221 A 12 ) 1  ( A 11  A 12 A 221 A 12 ) 1 A 12 A 221 

are representations of the Cayley inverse partitioned matrix A 1 in terms of

  B11A11  B12 A12

 Im 0   A11 A12   A11 A12 

A11B12  A12 B 22  0 (second left equation) 