You are on page 1of 15

Positive Definiteness of a Matrix

Pseudo-inverse of a Matrix

Null Space of a Matrix


1

M3S3/S4 STATISTICAL THEORY II


POSITIVE DEFINITE MATRICES

Definition: Positive Definite Matrix


A square, p × p symmetric matrix A is positive definite if, for all x ∈ Rp ,

xT Ax > 0

Properties: Suppose that A


 
a11 a12 · · · a1p
 a21 a22 · · · a2p 
 
A = [aij ] =  .. .. .. .. 
 . . . . 
ap1 ap2 · · · app

is a positive definite matrix.

1. The r × r (1 ≤ r ≤ p) submatrix Ar ,
 
a11 a12 · · · a1r
 a21 a22 · · · a2r 
 
Ar =  .. .. .. .. 
 . . . . 
ar1 ar2 · · · arr

is also positive definite.

2. The p eigenvalues of A, λ1 , . . . , λp are positive. Conversely, if all the eigenvalues of a matrix B are
positive, then B is positive definite.

3. There exists a unique decomposition of A

A = LLT (1)

where L is a lower triangular matrix


 
l11 0 · · · 0
 l21 l22 · · · 0 
 
L = [lij ] =  .. .. . . .. .
 . . . . 
lp1 lp2 · · · lpp

Equation (1) gives the Cholesky Decomposition of A.

4. There exists a unique decomposition of A

A = SS (2)

where S can be denoted A1/2 . S is the matrix square root of A.

5. There exists a unique decomposition of A

A = V DV T (3)
2

where  
λ1 0 · · · 0
 0 λ2 · · · 0 
 
D = diag(λ1 , . . . , λp ) =  .. .. . . .. 
 . . . . 
0 0 ··· λp
is the diagonal matrix composed of the eigenvalues of A, and V is an orthogonal matrix

V TV = 1

Equation (3) gives the Singular Value Decomposition of A.

6. As A = V DV T ,
|A| = |V DV T | = |V ||D||V T | = |V |2 |D| = |D| > 0
as
p
Y
|V | = 1 and |D| = λi > 0
i=1

by 2 and 5.

7. By 6., as |A| > 0, A is non-singular, that is, the inverse of A, A−1 exists such that

AA−1 = A−1 A = 1.

In fact
A−1 = (V DV T )−1 = V D−1 V T
as
V −1 = V T .

8. A−1 is positive definite.

9. For x ∈ Rp ,
xT Ax
min λi ≤ ≤ max λi
1≤i≤p xT x 1≤i≤p

10. If A and B are positive definite, then

(i) |A + B| ≤ |A| + |B|.


(ii) If A − B is positive definite, |A| > |B|.
(iii) B −1 − A−1 is positive definite.
TEST FOR POSITIVE AND NEGATIVE DEFINITENESS

We want a computationally simple test for a symmetric matrix to induce a positive definite quadratic
form. We first treat the case of 2 × 2 matrices where the result is simple. Then, we present the conditions
for n × n symmetric matrices to be positive definite. Finally, we state the corresponding condition for the
symmetric matrix to be negative definite or neither. Before starting all these cases, we recall the relationship
between the eigenvalues and the determinant and trace of a matrix.
For a matrix A, the determinant and trace are the product and sum of the eigenvalues:
det(A) = λ1 · · · λn , and
tr(A) = λ1 + · · · + λn ,
where λ j are the n eigenvalues of A. (Here we list an eigenvalue twice if it has multiplicity two, etc.)

1. T WO BY TWO MATRICES
 
a b
Let A = be a general 2 × 2 symmetric matrix. We will see in general that the quadratic form
b c
for A is positive definite if and only if all the eigenvalues are positive. Since, det(A) = λ1 λ2 , it is necessary
that the determinant of A be positive. On the other hand if the determinant is positive, then either (i) both
eigenvalues are positive, or (ii) both eigenvalues are negative. Since tr(A) = λ1 + λ2 , if det(A) > 0 and
tr(A) > 0 then both eigenvalues must be positive. We want to give this in a slightly different form that is
more like what we get in the n × n case. If det(A) = ac − b2 > 0, then ac > b2 ≥ 0, and a and c must
have the same sign. Thus det(A) > 0 and tr (A) > 0 is equivalent to the condition that det(A) > 0 and
a > 0. Therefore, a necessary and sufficient condition for the quadratic form of a symmetric 2 × 2 matrix
to be positive definite is for det(A) > 0 and a > 0.
We want to see the connection between the condition on A to be positive definite and completion of the
squares.
 
x
Q(x, y) = (x, y)A
y
= a x 2 + 2b x y + c y 2
 b 2  ac − b2  2
=a x+ y + y .
a a
This expresses the quadratic form as a sum of two squares by means of “completion of the squares”. If
a > 0 and det(A) > 0, then both these coefficients are positive and the form is positive definite. It can also
ac − b2
be checked that a and are the pivots when A is row reduced. We can summarize these two results
a
in the following theorem.
Theorem 1. Let A be an 2 × 2 symmetric matrix and Q(x) = xT Ax the related quadratic form. The
following conditions are equivalent:
(i) Q(x) is positive definite.
(ii) Both eigenvalues of A are positive.
(iii) Both a x 2 and (x, y)A(x, y)T are positive definite.
(iv) Both det(A) > 0 and a > 0.
(v) Both the pivots obtained without row exchanges or scalar multiplications of rows are positive.
(vi) By completion of the squares, Q(x) can be represented as a sum of two squares, with both positive
coefficients.
1
2 TEST FOR POSITIVE AND NEGATIVE DEFINITENESS

2. P OSITIVE DEFINITE QUADRATIC FORMS


In the general n × n symmetric case, we will see two conditions similar to these for the 2 × 2 case.
A condition for Q to be positive definite can be given in terms of several determinants of the “principal”
submatrices. Second, Q is positive definite if the pivots are all positive, and this can be understood in terms
of completion of the squares.
Let A be an n ×n symmetric matrix. We need to consider submatrices of A. Let Ak be the k ×k submatrix
formed by deleting the last n − k rows and last n − k columns of A,
Ak = ai, j 1≤i≤k,1≤ j≤k .


The following theorem gives conditions of the quadratic form being positive definite in terms of determinants
of Ak .
Theorem 2. Let A be an n × n symmetric matrix and Q(x) = xT Ax the related quadratic form. The
following conditions are equivalent:
(i) Q(x) is positive definite.
(ii) All the eigenvalues of A are positive.
(iii) For each 1 ≤ k ≤ n, the quadratic form associated to Ak is positive definite.
(iv) The determinants, det(Ak ) > 0 for 1 ≤ k ≤ n.
(v) All the pivots obtained without row exchanges or scalar multiplications of rows are positive.
(vi) By completion of the squares, Q(x) can be represented as a sum of squares, with all positive coeffi-
cients,
Q(x1 , . . . , xn ) = (x1 , . . . , xn )UT DU(x1 , . . . , xn )T
= p1 (x1 + u 1,2 x2 + · · · + u 1,n xn )2
+ p2 (x2 + u 2,3 x3 + · · · + u 2,n xn )2
+ · · · + pn xn2 .

Proof. We assume A is symmetric so we can find an orthonormal basis of eigenvectors v1 , . . . vn with


eigenvalues λ1 , . . . λn . Let P be the orthogonal matrix formed by putting the v j as the columns. Then
PT AP = D is the diagonal matrix with entries λ1 , . . . λn . Setting
x = y1 v1 + · · · + yn vn = Py,
the quadratic form turns into a sum of squares:
Q(x) = xT Ax
= yT PT APy
= yT Dy
n
X
= λ j y 2j .
j=1

From this representation, it is clear that Q is positive definite if and only if all the eigenvalues are positive,
i.e., conditions (i) and (ii) are equivalent.
Assume Q is positive definite. Then for any 1 ≤ k ≤ n,
0 < Q(x1 , . . . , xk , 0, . . . , 0)
= (x1 , . . . , xk , 0, . . . , 0)A(x1 , . . . , xk , 0, . . . , 0)T
= (x1 , . . . , xk )Ak (x1 , . . . , xk )T
for all (x1 , . . . , xk ) 6= 0. This shows that (i) implies (iii).
TEST FOR POSITIVE AND NEGATIVE DEFINITENESS 3

Assume (iii). Then all all the eigenvalues of Ak must be positive since (i) and (ii) are equivalent for Ak .
Notice that the eigenvalues of Ak are not necessarily eigenvalues of A. Therefore the determinant of Ak is
positive since it is the product of its eigenvalues. This is true for all k, so this shows that (iii) implies (iv).
Assume (iv). When A is row reduced, it also row reduces all the Ak since we do not perform any row
exchanges. Therefore the pivots of the Ak are pivots of A. Also, the determinant of Ak is the product of the
first k pivots, det(Ak ) = p1 . . . pk . Therefore
pk = ( p1 . . . pk )/( p1 . . . pk−1 ) = det(Ak )/ det(Ak−1 > 0,
for all k. This proves (v).
Now assume (v). Row reduction can be realized by matrix multiplication on the left by a lower triangular
matrix. Therefore, we can write A = LDU where D is the diagonal matrix made up of the pivots, L is
lower triangular with ones on the diagonal, and U is upper diagonal with ones on the diagonal. Since A is
symmetric, LDU = A = AT = UT DLT . It can then be shown that UT = L. Therefore,
Q(x1 , . . . , xn ) = (x1 , . . . , xn )UT DU(x1 , . . . , xn )T
= p1 (x1 + u 1,2 x2 + · · · + u 1,n xn )2
+ p2 (x2 + u 2,3 x3 + · · · + u 2,n xn )2
+ · · · + pn xn2 .
Thus, we can “complete the squares”, expressing Q as the sum of squares with the pivots as the coefficients.
If the pivots are all positive, then all the coefficients pi are positive. Thus (v) implies (vi). Note, that z = Ux
is a non-orthonormal change of basis that makes the quadratic form diagonal.
If Q(x) can be written as the sum of squares of the above form with positive coefficients, then the quadratic
form must be positive. Thus, (vi) implies (i). 
Example 3. Let
 
2 -1 0
A = -1 2 -1 .
0 -1 2

The eigenvalues are 2 and 2 ± 2 which are all positive, which shows that the quadratic form induced by A
is positive definite. (Notice that these eigenvalues are not especially easy to calculate.)
We can row reduce to represent A as the product of lower triangular, diagonal, and upper triangular
matrices.
1 - 12 0
   
1 0 0 2 0 0
A = - 12 1 0 0 32 0  0 1 - 32  .
0 - 23 1 0 0 43 0 0 1
Since the pivots on the diagonal are all positive, the quadratic form induced by A is positive definite, and
1 3 2 4
xT Ax = 2(x1 − x2 )2 + (x2 − x3 )2 + x32 .
2 2 3 3
The principal submatrices and their determinants are
A1 = (2), det(A1 ) = 2 > 0,
 
2 -1
A2 = , det(A2 ) = 3 > 0,
-1 2
det(A) = 2 3/2 4/3 = 4 > 0.
 
A3 = A
Since these are all positive, the quadratic form induced by A is positive definite.
4 TEST FOR POSITIVE AND NEGATIVE DEFINITENESS

3. N EGATIVE DEFINITE QUADRATIC FORMS


The conditions for the quadratic form to be negative definite are similar, all the eigenvalues must be
negative.
Theorem 4. Let A be an n × n symmetric matrix and Q(x) = xT Ax the related quadratic form. The
following conditions are equivalent:
(i) Q(x) is negative definite.
(ii) All the eigenvalues of A are negative.
(iii) The quadratic forms associated to all the Ak are negative definite.
(iv) The determinants, (-1)k det(Ak ) > 0 for 1 ≤ k ≤ n, i.e., det(A1 ) < 0, det(A2 ) > 0, . . . ,
(-1)n det(An ) = (-1)n det(A) > 0.
(v) All the pivots obtained without row exchanges or scalar multiplications of rows are negative.
(vi) By completion of the squares, Q(x) can be represented as a sum of squares, with all negative coeffi-
cients,
Q(x1 , . . . , xn ) = (x1 , . . . , xn )UT DU(x1 , . . . , xn )T
= p1 (x1 + u 1,2 x2 + · · · + u 1,n xn )2
+ p2 (x2 + u 2,3 x3 + · · · + u 2,n xn )2
+ · · · + pn xn2 .
For condition (4), the idea is that the product of k negative numbers has the same sign as (-1)k .

4. P ROBLEMS
1. Decide whether the following matrices are positive definite, negative definite, or neither:
   
2 -1 -1 2 -1 -1
(a) -1 2 -1 (b) -1 2 1 
-1 -1 2 -1 1 2
 
  1 2 0 0
1 2 3 2 6 -2 0 
(c) 2 5 4 (d)  
0 -2 5 -2
3 4 9
0 0 -2 3

R EFERENCES
[1] C. Simon and L. Blume, Mathematics for Economists, W. W. Norton & Company, New York, 1994
[2] G. Strang, Linear Algebra and its Applications, Harcourt Brace Jovanovich, Publ., San Diego, 1976.
The Moore-Penrose Pseudoinverse (Math 33A: Laub)

In these notes we give a brief introduction to the Moore-Penrose pseudoinverse, a gen-


eralization of the inverse of a matrix. The Moore-Penrose pseudoinverse is defined for any
matrix and is unique. Moreover, as is shown in what follows, it brings great notational
and conceptual clarity to the study of solutions to arbitrary systems of linear equations and
linear least squares problems.

1 Definition and Characterizations


We consider the case of A ∈ IRm×n
r . Every A ∈ IRm×n
r has a pseudoinverse and, moreover,
the pseudoinverse, denoted A ∈ IRn×m
+
r , is unique. A purely algebraic characterization of
A+ is given in the next theorem proved by Penrose in 1956.

Theorem: Let A ∈ IRm×n


r . Then G = A+ if and only if

(P1) AGA = A

(P2) GAG = G

(P3) (AG)T = AG

(P4) (GA)T = GA

Furthermore, A+ always exists and is unique.

Note that the above theorem is not constructive. But it does provide a checkable cri-
terion, i.e., given a matrix G that purports to be the pseudoinverse of A, one need simply
verify the four Penrose conditions (P1)–(P4) above. This verification is often relatively
straightforward.
" #
1
Example: Consider A = . Verify directly that A+ = [ 15 , 25 ]. Note that other left
2
inverses (for example, A−L = [3 , −1]) satisfy properties (P1), (P2), and (P4) but not (P3).

Still another characterization of A+ is given in the following theorem whose proof can
be found on p. 19 in Albert, A., Regression and the Moore-Penrose Pseudoinverse, Aca-
demic Press, New York, 1972. We refer to this as the “limit definition of the pseudoinverse.”

Theorem: Let A ∈ IRm×n


r . Then
−1
A+ = lim (AT A + δ 2 I) AT (1)
δ→0
−1
= lim AT (AAT + δ 2 I) (2)
δ→0

1
2 Examples
Each of the following can be derived or verified by using the above theorems or characteri-
zations.
−1
Example 1: A+ = AT (AAT ) if A is onto, i.e., has linearly independent rows (A is right
invertible)
−1
Example 2: A+ = (AT A) AT if A is 1-1, i.e., has linearly independent columns (A is left
invertible)

Example 3: For any scalar α,


(
+ α−1 if α 6= 0
α =
0 if α = 0

Example 4: For any vector v ∈ IRn ,


(
vT
+ T if v 6= 0
v + = (v T v) v = vT v
0 if v = 0
" #+ " #
1 0 1 0
Example 5: =
0 0 0 0
This example
"
was computed
#+ "
via the
#
limit definition of the pseudoinverse.
1 1
1 1
Example 6: = 41 41
1 1 4 4
This example was computed via the limit definition of the pseudoinverse.

3 Some Properties
Theorem: Let A ∈ IRm×n and suppose U ∈ IRm×m , V ∈ IRn×n are orthogonal (M is
orthogonal if M T = M −1 ). Then

(U AV )+ = V T A+ U T .

Proof: Simply verify that the expression above does indeed satisfy each of the four Penrose
conditions.

Theorem: Let S ∈ IRn×n be symmetric with U T SU = D, where U is orthogonal and D


is diagonal. Then S + = U D+ U T where D+ is again a diagonal matrix whose diagonal
elements are determined according to Example 3.

Theorem: For all A ∈ IRm×n ,


+ +
1. A+ = (AT A) AT = AT (AAT )
+ T
2. (AT ) = (A+ )

2
Both of the above two results can be proved using the limit definition of the pseudoinverse.
The proof of the first result is not particularly easy nor does it have the virtue of being
especially illuminating. The interested reader can consult the proof in Albert, p. 27. The
proof of the second result is as follows:
+ −1
(AT ) = lim (AAT + δ 2 I) A
δ→0
−1 T
= lim [AT (AAT + δ 2 I) ]
δ→0
−1 T
= [lim AT (AAT + δ 2 I) ]
δ→0
T
= (A+ )

Note now that by combining the last two theorems we can, in theory at least, com-
pute the Moore-Penrose pseudoinverse of any matrix (since AAT and AT A are symmet-
ric). Alternatively, we could compute the pseudoinverse by first computing the SVD of
A as A T + + T
" = U ΣV# and then by the first theorem of this section A = V Σ U where
S −1 0
Σ+ = . This is the way it’s done in Matlab; the command is called mpp.
0 0

Additional useful properties of pseudoinverses:


+
1. (A+ ) = A
+ + + +
2. (AT A) = A+ (AT ) , (AAT ) = (AT ) A+

3. R(A+ ) = R(AT ) = R(A+ A) = R(AT A)


+
4. N (A+ ) = N (AA+ ) = N ((AAT ) ) = N (AAT ) = N (AT )
+ k
5. If A is normal then Ak A+ = A+ Ak for all k > 0, and (Ak ) = (A+ ) for all k > 0.

Note: Recall that A ∈ IRn×n is normal if AAT = AT A. Thus if A is symmetric, skew-


symmetric, or orthogonal, for example, it is normal. However, a matrix can be none of the
preceding but still be normal such as
" #
1 −1
A= .
1 1

4 Applications to the Solution of Arbitrary Linear Systems


The first theorem is fundamental to using pseudoinverses effectively for studying the solution
of arbitrary systems of linear equations.

3
Theorem: Suppose A ∈ IRm×n , b ∈ IRm . Then R(b) ⊆ R(A) if and only if AA+ b = b.

Proof: Suppose R(b) ⊆ R(A). Take arbitrary γ ∈ IR so that γb ∈ R(b) ⊆ R(A). Then
there exists a vector v ∈ IRn such that Av = γb. Thus we have
γb = Av = AA+ Av = AA+ γb
where one of the Penrose properties is used above. Since γ ∈ IR was arbitrary, we have
shown that b = AA+ b. To prove the converse, assume now that AA+ b = b. Then it is clear
that b ∈ R(b) and hence
b = AA+ b ∈ R(A) .

We close with some of the principal results concerning existence and uniqueness of solutions
to the general matrix linear system Ax = b, i.e., the solution of m equations in n unknowns.

Theorem: (Existence) The linear system


Ax = b ; A ∈ IRm×n , b ∈ IRm (3)
has a solution if and only if R(b) ⊆ R(A); equivalently, there is a solution to these m
equations in n unknowns if and only if AA+ b = b.

Proof: The subspace inclusion criterion follows essentially from the definition of the range
of a matrix. The matrix criterion is from the previous theorem.

Theorem: (Solution) Let A ∈ IRm×n , B ∈ IRm and suppose that AA+ b = b. Then any
vector of the form
x = A+ b + (I − A+ A)y where y ∈ IRn is arbitrary (4)
is a solution of
Ax = b. (5)
Furthermore, all solutions of (5) are of this form.

Proof: To verify that (4) is a solution, pre-multiply by A:


Ax = AA+ b + A(I − A+ A)y
= b + (A − AA+ A)y by hypothesis
= b since AA+ A = A by the first Penrose condition.
That all solutions are of this form can be seen as follows. Let z be an arbitrary solution of
(5), i.e., Az = b. Then we can write
z ≡ A+ Az + (I − A+ A)z
= A+ b + (I − A+ A)z

4
and this is clearly of the form (4).

Remark: When A is square and nonsingular, A+ = A−1 and so (I − A+ A) = 0. Thus, there


is no “arbitrary” component, leaving only the unique solution x = A−1 b.

Theorem: (Uniqueness) A solution of the linear equation

Ax = b ; A ∈ IRm×n , B ∈ IRm (6)

is unique if and only if A+ A = I; equivalently, there is a unique solution if and only if


N (A) = 0.

Proof: The first equivalence is immediate from the form of the general solution in (4). The
second follows by noting that the n × n matrix A+ A = I only if r = n where r = rank(A)
(recall r ≤ n). But rank(A) = n if and only if A is 1-1 or N (A) = 0.

EXERCISES:
" #
1 1
1. Use the limit definition of the pseudoinverse to compute the pseudoinverse of .
2 2
+ + +
2. If x, y ∈ IRn , show that (xy T ) = (xT x) (y T y) yxT .

3. For A ∈ IRm×n , prove that R(A) = R(AAT ) using only definitions and elementary
properties of the Moore-Penrose pseudoinverse.

4. For A ∈ IRm×n , prove that R(A+ ) = R(AT ).

5
The Nullspace of a Matrix
The solution sets of homogeneous linear systems provide an important source of
vector spaces. Let A be an m by n matrix, and consider the homogeneous system

A x= 0
Since A is m by n, the set of all vectors x which satisfy this equation forms a subset
of Rn. (This subset is nonempty, since it clearly contains the zero
vector: x = 0 always satisfies A x= 0.) This subset actually forms a subspace
of Rn, called the nullspace of the matrix A and denoted N(A). To prove that N(A) is
a subspace of R n , closure under both addition and scalar multiplication must be
established. If x1 and x 2 are in N(A), then, by definition, A x 1 = 0 and A x 2 = 0.
Adding these equations yields

which verifies closure under addition. Next, if x is in N(A), then A x = 0, so if k is


any scalar,

verifying closure under scalar multiplication. Thus, the solution set of a


homogeneous linear system forms a vector space. Note carefully that if the system
is not homogeneous, then the set of solutions is not a vector space since the set
will not contain the zero vector.

Example 1: The set of solutions of the homogeneous system

forms a subspace of Rn for some n. State the value of n and explicitly determine
this subspace.
Since the coefficient matrix is 2 by 4, x must be a 4‐vector. Thus, n = 4: The
nullspace of this matrix is a subspace of R4. To determine this subspace, the
equation is solved by first row‐reducing the given matrix:
Therefore, the system is equivalent to

that is,

If you let x 3 and x 4 be free variables, the second equation directly above implies

Substituting this result into the other equation determines x 1:

Therefore, the set of solutions of the given homogeneous system can be written
as

which is a subspace of R 4. This is the nullspace of the matrix

Example 2: Find the nullspace of the matrix

By definition, the nullspace of A consists of all vectors x such that A x = 0. Perform


the following elementary row operations on A,
to conclude that A x = 0 is equivalent to the simpler system

The second row implies that x 2 = 0, and back‐substituting this into the first row
implies that x 1 = 0 also. Since the only solution of A x = 0 is x = 0, the nullspace
of A consists of the zero vector alone. This subspace, { 0}, is called the trivial
subspace (of R 2).

Example 3: Find the nullspace of the matrix

To solve B x = 0, begin by row‐reducing B:

The system B x = 0 is therefore equivalent to the simpler system

Since the bottom row of this coefficient matrix contains only zeros, x 2 can be
taken as a free variable. The first row then gives so any vector
of the form

satisfies B x = 0. The collection of all such vectors is the nullspace of B, a subspace


of R 2:

You might also like