You are on page 1of 28

Probability and Statistics

Bai Huang

I.

Matrix Algebra

1. Vector Spaces
1.1 Real Vectors
: The set of (finite) real numbers
m : The m-dimensional Euclidean space
(Cartesian product of s: )
A Cartesian product A1 A2 Am:
All possible ordered pairs, whose i ele ent is fro

A , i = ,,

A (real) vector: A particular element in m , denoted by x = (x1 , x2 , , xm )


x s: The elements or components of the vector x
m: The order of the vector x
The arithmetic operations for two vectors:

Addition:
x + y = (x + y , x + y , , x + y )
a. x and y must be of the same order.
b. [commutativity] x + y = y + x
c. [associativity] (x + y) + z = x + (y + z)
Scalar multiplication: cx : = (cx1 , cx2 , , cxm ) , where c is a constant
Two vectors x and y are collinear if either x = 0 or y = 0 or y = cx.
Inner product:
< x , >= m 1 x y = x y = y x
a. Properties:
i) < , > = < , x >
ii) < , + z > = < x , > + < x , >
iii) < x , > = c < x , >
iv) <x , x> 0, with <x , x> = 0 iff x = 0
b. Norm: || x || : = < x , x >1/2
It is the geometric idea of length of the vector x.
c. A vector x is said to be normalized if ||x|| = 1
Any nonzero vector x can be normalized to x by x =

|| ||

. Since x

has unit length. We focus only on the direction then.


d. Two vectors x and y are orthogonal if <x , y> = 0. We write xy.

e. If, in addition, ||x|| = ||y|| = 1, the two vectors are said to be orthonormal.
Example: In m , the unit vectors (or elementary vectors)
e1 = ( 0 0 0) , e2 = (0 0 0) ,, em = (0 0 0 ) ,
are orthonormal.
f. Cauchy Schwarz inequality:
< , >2 ||x||2 ||y||2 with equality iff x and y are collinear.
Exercise: Prove C-S inequality using properties of inner product.
g. Triangle inequality:
||x + y|| ||x|| + ||y|| with equality iff x and y are collinear.
Exercise: Prove triangle inequality. [Hint: using C-S inequality]
x

The angle between two nonzero vectors x and y:

x-y
y
By the cosine rule, ||x y||2 = ||x||2 + ||y||2 ||x||||y||cos.
After simplification, this becomes < , > = || |||| ||
, thus the angle
between x and y is determined by cos =

,
|| |||| ||

(0 <

< )

a. If x and y are orthogonal, <x , y> = 0. Thus cos = 0 = 2 .


b. The projection of y onto x = ||y||cos
g

|| ||

1.2 Complex Vectors


A complex number, say u, is denoted by u = a + ib, where a and b are real
numbers and i is the imaginary unit defined by i2 = . We write Re(u) = a
and I (u) = b.
If u = a + ib, v = c + id are two complex numbers, they are said to be equal
iff a = c and b = d.
Addition: u + v = (a + c) + i(b + d)
Product: uv = (ac bd) + i(ad + bc)
The complex conjugate of u = a + ib is defined as u = a ib.
(u ) = u
(u + v) = u + v
(uv) = u v
uv u v unless uv is a real number
The modulus of u = a + ib is defined by |u| = (u u )1/2 = a2 + b 2 .

Division:

u uv* uv*
=
=
v vv* |v|2

Inner product: <

Norm: ||u|| = <

> =
,

m
i=1

u i v*i

>1/2

: The set of all complex numbers


1.3 Vector Spaces
A vector space
is a nonempty set of elements (called vectors) together with
two operations and a set of axioms.
Twp operations:
a. Addition: For any x, y , x + y
b. Scalar multiplication: For any x , and any real (or complex) scalar ,
x
Axioms:
a. Addition
i) x + y = y + x
ii) x + (y + z) = (x + y) + z
iii) a vector in
(denoted by 0) such that x + 0 = x for all x
iv) x , a vector in
(denoted by -x) such that x + (-x) = 0
b. Scalar multiplication
i) (x) = ()x
ii) 1x = x
c. Distributive laws
i) (x + y) = x + y
ii) ( + )x = x + x
It is the scalar rather than the vector that determines whether the space is real
or complex.
Three commonly used vector spaces:

Complex vector space

Inner product space

add inner product

Hilbert space

add completeness

A nonempty subset
of a vector space
is called a subspace of
if, for all
x, y , we have x + y and x for any scalar .
The intersection of two subspaces
and in a vector space , denoted by
, consists of all vectors that belong to both
and .
The union of two subspaces
and in a vector space , denoted by
,
consists of all vectors that belong to at least one of
and .

The sum of two subspaces


and in a vector space , denoted by
consists of all vectors of the form a + b where a
and b .

+ ,

A linear combination of the vectors x1 , x2 ,, x in a vector space is a sum of


the form 1 x1 + 2 x2 + + x
A finite set of vectors x1 , x2 ,, x (n ) is said to be linearly dependent
if there exist scalars 1 , 2 , , , not all zero, such that
1 x1 + 2 x2 + + x = 0; otherwise it is linearly independent.
are the vectors
Exercise: For which values of
( ,1, 0) , (1, ,1) , and (0,1, ) linearly dependent?

An arbitrary set of elements A of


(containing possibly an infinite number
of vectors) is linearly independent if every nonempty finite subset of A is
linearly independent; otherwise it is linearly dependent.
Let A be a nonempty set of vectors from a vector space . The set
consisting of all linear combinations of vectors in A is called the subspace
spanned (or generated) by A.
Any set of n linearly independent real vectors x1 , x2 ,, x spans .
If a vector space
contains a finite set of n linearly independent vectors
x1 , x2 ,, x , but any set of n + 1 vectors is linearly dependent, then the
dimension of
will be n. We write dim ( ) = n.
In this case,
is said to be finite dimensional. If no such n exists,
is
then infinite dimensional. In particular, for = {0}, we say dim ( ) = 0.
If
is a vector space, finite or infinite, and A is a linearly independent set of
vectors from , then A is a basis of
if
is spanned by A.
Example: a. What is the dimension of when the field of scalars is ?
b. What is the dimension of when the field of scalars is ?
A complex vector space
is an inner product space if
x, y , a complex-valued function < , >, which is called the inner
product of x and y, such that
i) < , > = < , >
ii) < + , > = < , > +< , >
iii) <
, >= < , >
iv) < , > 0, with equality iff x = 0
A real vector space
is an inner product space if x, y , a real
number < , > satisfying conditions i) iv).
Consider a real-valued function (x) defined on an inner product space. It is a
norm if (x) satisfies
i) (x) = ||(x)
ii) (x) 0 with equality iff x = 0
iii) (x + y) (x) + (y)
Exercise: Show that ||x|| < , x >1/2 is a norm.

The concept of an inner product not only induces the idea of length (the norm)
||x|| < , x >1/2 , but also of distance of two vectors x and y,
d (x , y) ||x y||, which satisfies
i) d (x , x) = 0
ii) d (x , y) > 0 if x y
iii) d (x , y) = d (y , x)
iv) d (x , y) d (x , z) + d (z , y)
In any inner product space, the Cauchy-Schwarz inequality and the triangle
inequality hold. Besides, an equality, called the parallelogram theorem, is
stated as
||x + y||2 + ||x y||2 = ||x||2 + ||y||2
Example: Prove the parallelogram inequality in terms of algebra and geometry.
For two vectors x and y in an inner product space, we say that x and y are
orthogonal if < , > = 0, we write xy.
[Pythagorean Theorem]
If xy in an inner product space, ||x + y||2 = ||x||2 + ||y||2.
Exercise: Does the converse hold?
A is called an orthogonal set if each pair of vectors in A is orthogonal. If, in
addition, each vector in A has unit length, then A is called an orthonormal set.
Example: Prove that any orthonormal set is linearly independent.
Two subspaces
and of an inner product space
are said to be
orthogonal if every vector in
is orthogonal to every vector in .
If
is a subspace of an inner product space , then the space of all vectors
is called the orthogonal complement of , denoted by
.
orthogonal to
Example: Prove that
is a subspace of .
A sequence {x n }n

in an inner product space is said to converge in the norm to

x if ||x x|| 0 as n .
[Continuity of inner product]
If {x n } , { yn } are two sequences in an inner product space such that
||x x|| 0, ||y y|| 0, then
i) ||x || ||x||, ||y || ||y||
ii) < x , y > < x , y >

A sequence {x n }n

in an inner product space is a Cauchy sequence if

||x xm || 0 as n,
( > 0, () > 0

.
. . |x xm | < ,

>

() )

An inner product space is a Hilbert space () if it is complete, namely, every


Cauchy sequence in converges in the norm to an element x .

2. Matrices
2.1 Matrix Terminology
A matrix is a rectangular array of numbers, denoted

A [aij ]im, ,jn 1

a11

a12

a1n

a21

a22

a2 n

am1 am 2

amn

where aij s are called the entries or elements of A .


The dimensions of a matrix are the numbers of its rows and columns. So the
dimension of matrix A is m n (m by n); or A is an m n matrix.
Sometimes the dimension is also called the order.
A matrix with all entries zero is called a zero matrix or null matrix.
When m = n, A becomes a square matrix.
Several types of square matrices are listed here:
A symmetric matrix is one in which aij

a ji

i, j

A diagonal matrix is a square matrix with only nonzero elements on the


main diagonal.
An identity matrix is a diagonal matrix with all entries equal to 1 on the
main diagonal. It is denoted by I . For example, I 3 is a 3 3 identity
matrix.
A triangular matrix is a square matrix that has only zeros either above or
below the main diagonal. If the zeros are above the diagonal, the matrix is
said to be lower triangular. Otherwise, it is upper triangular.
An orthogonal matrix P is a square matrix that satisfies PP P P I .
Here are some concepts for square matrices: inverse, determinant, and trace.
A square matrix A is invertible if
We write B

B, s.t. An n Bn

A 1 . Also, B is invertible and A B 1 .

a. An inverse, if it exists, is unique.


Exercise: Prove this.

Bn n An

In .

b. A is invertible
The column vectors of
independent.
c. Example: i) A zero matrix is non-invertible
ii) The inverse of an identity matrix is itself
iii) The inverse of a diagonal matrix:

a1

a1 1

0
,

an

are linearly

an

if all ai

iv) The inverse of a 2 2 matrix:


1

a b
c d

d
ad bc c
1

b
,
a

if ad bc

d. Calculation of the inverse:


i) Definition Method
ii) Formula Method
iii) Elementary Transformation Method
e. Properties:
i) ( A 1 )

ii) ( A 1 )
iii) ( AB)

(A )
1

B 1A

iv) ( A BD 1C )

A 1B ( D CA 1 B) 1 CA

Exercise: Try to figure out the special cases for iv) when
(1) B C ?
(2) D

I, B

b, C

c?

The determinant of a square matrix is a function of the elements of the


matrix A, denoted by det (A) or |A|.
a. Definition (expansion by cofactors):
Using any row, say i, we obtain

| A|

aij ( 1)i j | Aij |,

j 1,

, n , where

j 1

Aij is the matrix obtained from A by deleting row i and column j. The

determinant of Aij is called a minor of A , or the (i,j) minor of A .


When the correct sign, ( 1)i j , is added, it becomes a cofactor, or the (i,j)
cofactor. This operation can be done using any column as well.

Obviously, it is easy to choose the row or column that has the most zeros
to make the expansion. It is unlikely, though, that you will ever calculate
any determinant over 3 3 without a computer.
b. The determinant provides important information when the matrix is that
of the coefficients of a system of linear equations. The system has a
unique solution if and only if the determinant is nonzero.
c. When the determinant corresponds to a linear transformation of a vector
space, the transformation has an inverse operation if and only if the
determinant is nonzero.
d. A is invertible

| A| 0

e. A geometric interpretation can be given to the value of the determinant of


a square matrix with real entries:
The absolute value of the determinant gives the scale factor by which
area or volume is multiplied under the associated linear transformation,
while its sign indicates whether the transformation preserves orientation.
Example: i)

a b
c d

ad bc

a b

d
ii)
g

e
h

f
i

e
h

f
i

d
g

f
i

d
g

e
h

aei afh bdi bfg cdh ceg

f. Properties:
i) Switching two rows or columns changes the determinant sign.

ii) Any determinant with two identical rows or columns has value 0.
iii) A determinant with a row or column of zeros has value 0.
iv) Adding a scalar multiple of one row (or column) to another does not
change the determinant.
n

v) | An n |

| A|

vi) | A | | A |
vii) | A 1 | | A |
viii) If A1 , A2 ,
| A1 A2

, AK are all n n matrices,

AK | | A1 || A2 |
a11

ix) | A |

| AK |

a12

a1n

a11

a22

a2 n

a21

a22

aii
i 1

ann

x) | A |

an1

A11

A12

A11

A22

A21

A22

an 2

ann

| A11 || A22 |

Exercise: Using the concepts of inverse and determinant, prove the following
properties for orthogonal matrix P :
i) P P 1
ii) | P | =

iii) PQ is orthogonal when Q is orthogonal.


The trace of a square matrix is defined to be the sum of the entries on the
n

mail diagonal. tr ( An n )

aii .

i 1

Properties:
i) The trace is invariant under cyclic permutations, for example
tr ( ABCD )

tr ( BCDA)

tr (CDAB )

tr ( DABC )

ii) tr ( A ) tr ( A)
iii) For two matrices of the same dimensions,
tr ( A B ) tr ( AB )

tr ( BA )

tr ( B A)

aij bij
i, j

iv) tr ( A B ) tr ( A) tr ( B )

v) tr (cA)

ctr ( A)

vi)* E (tr ( X )) tr ( E ( X ))
An m n matrix can be viewed as a set of n column vectors in m , or as a set
of m rows in . Thus, associated with a matrix A are two vector spaces: the
column space and the row space.
The column space of A , denoted by colA , consists of all linear
combinations of the columns of A ,
col {x m : x = y for so e y }.
The row space of A , denoted by colA , consists of all linear combinations
of the rows of A , or the columns of A ,
col {y : y = x for so e x m }.
The column rank of A is the maximum number of linearly independent
columns it contains, namely, the dimension of the vector space that is
(col ).
spanned by its column space
The row rank of A is the maximum number of linearly independent rows it
contains, namely, the dimension of the vector space that is spanned by its row
(col ).
space
(col ) =
[The Rank Theorem] The nontrivial fact that
(col )
implies that the column rank of A is equal to its row rank. It follows that
(col ).
rk( ) =
A square n n matrix A is said to be nonsingular if rk( ) = n ;
otherwise, the matrix is singular. In fact,
A is invertible

A is nonsingular

rk( ) = n

Example: For a square n n matrix A , show that | A | 0

| A| 0

rk( ) < .

[The Rank Factorization Theorem] Every m n matrix A of rank r can


be written as A BC , where Bm

and Cn

both have rank r.

Example: Prove this theorem.


Simple properties of rank:
Let A be an m n matrix,
in ( , )
i) 0 rk( ) = rk
A is said to be of full rank if rk( ) =
A O
ii) rk( ) = 0
iii) rk( ) = n
iv) rk( ) = rk( ) if 0

in ( , ).

Rank inequalities (sum):


i) rk( + ) rk( ) + rk( )
ii) rk( ) |rk( ) rk( )|
Example: Prove i) and ii).
Rank inequalities (product):
col
i) col
ii) rk( ) in (rk( ), rk( ))
Example 1: Prove i) and ii).
Example 2: Let A be an m n matrix. If m<n, show that no m n matrix
B exists such that B A

In .

Example3: Let A be an m n matrix, and let Bm


nonsingular. Show that rk(

and Cn

be

) = rk( ).

A and AA span the same space, hence rk( ) = rk(

) = rk(

).

2.2 Algebraic manipulation of matrices


The equality of two matrices A and B : A

A (or AT )

The transpose of a matrix A : B


For any matrix A :

(A )

A is a symmetric matrix
A is normal if AA

Addition:

A B [aij

A 0

A B

aij

bij ,

i, j

bij

a ji ,

i, j

AA

bij ] { A and B must be of the same dimensions.}

( A B) C
( A B)

A (B C)
B

Scalar multiplication:

cA [caij ]

Matrix product:
For an m r matrix A [aik ] and an r n matrix B [bkj ] , the product

AB [cij ] [ ai b j ] , where ai is the i th row of A and b

matrix C

is the

j th column of B .
In general, AB

BA .

A and B commute if AB

BA .

A 0 0 A 0 {The three zero matrices may not be of the same dimensions}


A Ir

Im A

( AB )C

A( BC )

( A B )C

( AB )

AC

BC

BA

Matrix representations of summation:

xi

(1,1,

xi2

xx

xi yi

xy

,1)( x1 , x2 ,

, xn )

yx

For two real matrices A and B of the same dimension we define the inner
product as A, B
aij bij tr ( A B ) , which induces the norm:
i, j

|| A ||

A, A

1/2

aij2

tr ( AA )

i, j

A calculation that may help to condense the notation or simplify the coding is the
Kronecker product, denoted by
.
For general matrices A of dimension m n and B of dimension p q ,

a11 B

a12 B

a1n B

a21 B

a22 B

a2 n B

am1B am 2 B

amn B

is of dimension (mp) (nq) .

0
I

Example:

(A

B)

(A

B)

For square matrices Am


a. | A

and Bn n ,

B | | A |n | B |m

b. tr ( A
(A

B )(C

B)

tr ( A)tr ( B )

D)

( AC )

( BD )

A complex matrix U can be written as U A iB , where i is the imaginary unit.


The conjugate transpose of U, denoted by , is defined as = .
If
is real, then = .
A square matrix
is said to be Hermitian if = .
A square matrix
is said to be unitary if = .
2.3 System of Equations
Consider the set of n linear equations Ax = b, where x constitute the unknowns, A is
a know matrix of coefficients, and b is a specified vector of values. We are interested
in:
(1) whether a solution exists;
(2) if so, how to obtain it;
(3) if it does exist, then whether it is unique.
We only consider a square equation system here (i.e. those with an equal number of
equations and unknowns).
A homogeneous equation system is of the form Ax = 0.
Every homogeneous system has at least one solution, known as the zero
solution (or trivial solution).
If the system has a nonsingular matrix A, then zero is the only solution.
If the system has a singular matrix, then there is a solution set with an infinite
number of solutions. This solution set is closed under addition and scalar
multiplication.
A nonhomogeneous equation system is of the form Ax = b, where b is a
nonzero vector.
A nonhomogeneous equation system has a unique non-trivial solution
x = A 1 b if and only if A is nonsingular.
If A is singular, then the system has either no solution or an infinite number
of solutions.

[Gaussian Elimination Algorithm]:


m

[ : b] [ : x]

Example: Solve the equation

x1

x2

x3

5 .
7

3
2
5

SOLN:

1
2

3
6

1
5

1 0 0
0 1 0

3
2

0 0 1

x1

x2
x3

2 .
1

[Cramers Rule]:
Cramers Rule is an explicit formula for the solution of a system of linear
equations, with each variable given by a quotient of two determinants.
Example: Solve the equation

x1

x2

x3

5 .
7

SOLN: x1

x3

1
5
7
2
5
3

1
2
1
1
2
1

3
6
4
3
6
4

2 1 1
5 2 5
3 1 7
2 1
3
5 2
6
3 1 4

21
3 , x2
7

2
5
3
2
5
3

1
5
7
1
2
1

3
6
4
3
6
4

14
7

2,

7
1.
7

A Least Squares Problem


Given a vector y and a matrix X, we are interested in expressing y as a linear
combination of the columns of X. There are two possibilities.
If y lies in the column space of X, then we shall be able to find a vector b
such that y = Xb.
Suppose that y is not in the column space of X. Then there is no b such that y
= Xb holds. We can, however, write y = Xb + e, where e is the difference
between y and Xb, or residual.
We try to solve for b = ar in e e = ar in (y Xb) (y Xb).
Using the matrix calculus, b is found to be the solution to the
nonhomogenous system X y = X Xb. It follows that b = (X X) 1 X y.

2.4 Partitioned Matrices

Am

A partitioned matrix is a matrix of the form Z

Cn

Bm

Dn

None of the matrices needs to be square, but A and B must have the same number
of rows, A and C must have the same number of columns, and so on.
We will mainly focus on partitioned matrices with two row blocks and two column
blocks. It can be extended to m row blocks and n column blocks case, such as

Z11

Z12

Z1n

Z 21

Z 22

Z 2n

Z m1

Zm2

Z mn

As a special case, we say that a square matrix is block-diagonal if it takes the form

Z11
Z 22

Z rr
where all diagonal blocks are square, not necessarily of the same order.
A General Principle
The main tool in obtaining the inverse, determinant, and rank of a partitioned matrix
is to write the matrix as a product of simpler matrices, that is, matrices of which one
(or two) of the four blocks is the null matrix.
Some Basic Results
Partitioned sum: Let

Z1

A1
C1

B1
D1

and Z 2

A2
C2

B2
, then Z
D2

Z1 Z 2

Partitioned product: Z1 and Z 2 are defined above, then

Z1Z 2

A1 A2 B1C2
C1 A2 D1C2

A1B2 B1 D2
.
C1 B2 D1D2

Partitioned transpose:
A B
C D

A
B

C
D

Trace of partitioned matrix:


tr

A B
C D

tr ( A) tr ( D)

A1 A2
C1 C2

B1 B2
D1 D2

Elementary row-block operations


i)

In

Im

ii)

A B
C D

C D
A B

E O
O In

A B
C D

EA EB
C
D

Im
O

iii)

E
In

A B
C D

A EC
C

B ED
D

Elementary column-block operations


i)

A B
C D

O
Iq

Ip
O

ii)

A B
C D

F O
O Iq

iii)

A B
C D

Ip

Iq

B A
D C
AF
CF

B
D

A B AF
C D CF

If A is nonsingular, we have
Im
CA

Im

A 1B

In

Iq

O D CA 1 B

The matrix D CA 1 B is called the Schur complement of A .


Similarly, if D is nonsingular, we have

Im
O

BD
In

A B
C D

Ip

O
1

D C

A BD 1C
O

In

O
.
D

The matrix A BD 1C is called the Schur complement of D .


Inverses
If A and D are nonsingular,

If B and C are nonsingular,

Example: When will the matrix

A O

O D
O

O B
C O

be orthogonal?

If A and its Schur complement E


A

A 1 BE 1CA
E 1CA

A 1 BE

If D and its Schur complement F


A

D 1CF

D CA 1 B are nonsingular,

A BD 1C are nonsingular,

F 1 BD
1

D 1CF 1 BD

Determinants
Let Z

A B
,
C D

If A is nonsingular, | Z | | A || D CA 1 B | .
If D is nonsingular, | Z | | D || A BD 1C |
Let A and D be square matrices, of order m and n, respectively.
a. For any m m matrix E ,
EA EB
A B
|E|
C
D
C D
b. For any n m matrix E ,
A
B
C EA D EB
Example: Will

C D
A B

A B
C D
A B
always hold?
C D

2.5 Characteristic Roots and Vectors


A useful set of results for analyzing a square matrix A, real or complex, arises from
the solutions to the set of equations Ax = x.
The pairs of solutions are the characteristic roots (or eigenvalues) and their
associated characteristic vectors (or eigenvectors) x.
It is easy to see that the solution set for x is closed under scalar multiplication. To
remove the indeterminancy (apart from sign), x is normalized so that x x =
(x x = when x is real).
The solution then consists of and the n-1 unknown elements in x.
Ax = x (A I)x = 0, which is a homogeneous equation system.
It has a nonzero solution if and only if the matrix (A I) is singular,

i.e. |A I| = 0. This polynomial in is the characteristic equation of A.


Note: For a matrix A of order n, we use 1 , 2 , , to denote its eigenvalues
a. 1 , 2 , , can be real or complex. But for a symmetric matrix A, its
eigenvalues are always real numbers.
b. If appears n > times then it is called a multiple eigenvalue and the
number n is the (algebraic) multiplicity of ; if appears only once it
is called a simple eigenvalue.
Example: i) What are the eigenvalues of a diagonal matrix?
ii) What are the eigenvalues of a triangular matrix?
Although the eigenvalues are an excellent way to characterize a matrix, they do
not characterize a matrix completely.
Two different matrices may have the same eigenvalues.
Can one eigenvector be associated with two distinct eigenvalues?
Can two distinct vectors be associated with the same eigenvalue?
Example: Prove or think of an example.
Any linear combination of eigenvectors associated with the same eigenvalue is an
eigenvector for that eigenvalue.
Exercise: Prove this statement.
The geometric multiplicity of an eigenvalue is the dimension of the space
spanned by the associated eigenvectors. This dimension cannot exceed the
algebraic multiplicity.
Example: i) Find the eigenvalues and eigenvectors for the following two matrices:
3 0
0
1 3 0
a.

0 1 0
2 1 5

b.

6
1

ii) For the matrix in a., what are the (algebraic) multiplicity and the
geometric multiplicity of each eigenvalue? What about the matrix in b.?
iii) Are the eigenvectors for each matrix linearly independent? Are they
orthogonal?
Eigenvectors associated with distinct eigenvalues are linearly independent, but
not necessarily orthogonal.
If
is an eigenvalue of A , t
eigenvector(s).
If x is an eigenvector of A , t x ( t
with the same eigenvalue.

is an eigenvalue of tA , with the same

0 ) is also an eigenvector of A , associated

Do A and A have the same eigenvalues? Do they have the same eigenvectors?
Either provide a proof or a counterexample.
A is nonsingular if and only if all its eigenvalues are nonzero.
Example: Prove this theorem.

[Approximate inverse of singular matrix] If A is singular, then there exists a


scalar
0 such that A I is nonsingular.
Example: Prove this theorem.
Let
be a simple eigenvalue of a square matrix A , so that Ax
x for some
eigenvector x. If A and B commute, x is an eigenvector of B too.
Example: Prove this theorem.
Similarity
Two matrices are of the same dimension and the same rank are said to be
equivalent.
If A and B are equivalent matrices, then there exist nonsingular matrices
E and F such that B EAF .
When, in addition, A and B are square and there exists a nonsingular
matrix T such that T 1 AT B , then they are said to be similar.
Example: Prove the claim that similar matrices have the same set of eigenvalues.
Do they have the same set of eigenvectors as well?
Properties:
For an n n matrix A with eigenvalues 1 , 2 , , ,

| A|

n
i

i 1

Ak has eigenvalues

tr ( A)

k
1

k
2

k
n

n
i 1

Properties for symmetric matrix A :


The eigenvalues are all real
Eigenvectors associated with distinct eigenvalues are orthogonal.
(not sufficient and necessary)
The eigenvectors span
The rank is equal to the number of nonzero eigenvalues
Example:
a. Find a square matrix that does not possess this property.
b. Prove the theorem that the rank of any matrix A equals the number of
nonzero eigenvalues of A A
The matrix can be diagonalized

A matrix A can be diagonalized if there exists a matrix T such that


T 1 AT
, where
is a diagonal matrix containing the eigenvalues of A
The factorization theorems try to diagonalize the matrices. If a matrix cannot be
diagonalized, we ask how close to a diagonal representation we can get.
Because of the central role of factorization theorems, let us list them below:
If A is an m n matrix of rank r , then

A BC with Bm
EAF

and Cn

both of rank r .

diag ( I r , O) with E and F nonsingular.

[ QR Decomposition]
A QR (when r

n) with Q *Q

I n and R upper triangular matrix with

positive diagonal elements; if A is real, then Q and R are real as well.


[Singular Value Decomposition]
A U V * with U m

and Vn* n unitary,

m n

rectangular diagonal with

nonnegative real numbers on the diagonal.


The diagonal entries of
are known as the singular value of A . The m
columns of U and the n columns of V are called the left-singular vectors and
right-singular vectors of A.
If A is a square matrix of order n, then
[Schur Decomposition]
P* AP M with P unitary and M upper triangular.
[Spectral Theorem]
(diagonal) with P unitary, if and only if A is normal.
P* AP
[Spectral Decomposition]
(diagonal) with P orthogonal, if A is symmetric.
P AP
(diagonal) and P BP M (diagonal) with A and
P AP
symmetric and P orthogonal, if and only if A and B commute.

If A is a square matrix of order n, then also


[Jordan Decomposition] T 1 AT
T 1 AT

T 1 AT

J (Jordan matrix) with T nonsingular.

(diagonal) with T nonsingular, if A has distinct eigenvalues.


diag ( A) with T unit upper nonsingular, if A upper triangular

with distinct diagonal elements.


T 1 AT
(diagonal) and T 1 BT M (diagonal) with T nonsingular, if
A has only simple eigenvalues and commutes with B .

Example:
a. [Singular Value Decomposition]
Consider the 4 5 matrix
1 0 0 0 2
A

0 0 3 0 0
0 0 0 0 0
0 4 0 0 0

A singular value decomposition of this matrix is given by U V *

It can be verified that UU *

I 4 and VV *

I5 .

In fact, this particular singular value decomposition is not unique. Choosing V


such that

is also a valid singular value decomposition.


b. [Spectral Decomposition]
Consider the symmetric matrix A
|A

For

For

P AP

I | (1

)(3

) 8 (

1 , Ax

5 , Ax
2
3
1
3

1
3
2
3

2 2

2 2

1)(

x1

x2

5)

2x 2

2x1

2 2

2 2

2
3
1
3

1,

2 1
, )
3 3

1
2
,
)
3 3

1
3
2
3

1 0
0

Quadratic forms and definite matrices


n

Many optimization problems involve double sums of the form q

xi x j aij .

i 1 j 1

This quadratic form can be written as q

x Ax , where A is a symmetric

matrix. In general, q may be positive, negative, or zero; it depends on A and x.


There are some matrices, however, for which the sign of q will be determined
regardless of x. For a given matrix A,
If x Ax

( ) 0 for all nonzero x, then A is positive (negative) definite.

If x Ax

( ) 0 for all nonzero x, then A is nonnegative definite or positive

semidefinite (nonpositive definite).


Let A be a symmetric matrix,
a. If all eigenvalues of A are positive (negative), then A is positive definite
(negative definite).
b. If some of the roots are zero, then A is nonnegative (nonpositive) definite
if the remainder are positive (negative).
c. If A has both negative and positive roots, then A is indefinite.
Example: Use the Spectral Decomposition to explain.
Note: The if part is in fact if and only if.
If A is positive definite, then | A |

0 . {What if A is negative definite?}

If A is positive definite, so is A 1 .
The identity matrix is positive definite.
If An

(n > K) is of full rank, then A A is positive definite and AA is

nonnegative definite.
If A is positive definite and B is a nonsingular matrix, then B AB is
positive definite.
Define d x Ax x Bx

x ( A B )x .

If d is always positive for any nonzero vector x, then A is said to be


greater than B . We write A B . Or A B is positive definite.
a. Suppose

are those of B . If
definite.
b. If A B , then B

are the eigenvalues of A ,

, i 1,

Real Powers of a Positive Definite Matrix


P

, n , then A B is nonnegative

A 1.

For a positive definite matrix A, Ar

P , for any real number r .

{What if nonnegative definite?}


If A is nonnegative definite, then the powers can only be defined for r

0.

Cholesky Decomposition
Let A be a Hermitian, positive-definite matrix, then A can be decomposed as
A LL* , where L is a lower triangular matrix with strictly positive diagonal
entries, and L* denotes the conjugate transpose of L .
The Cholesky decomposition is of great importance in terms of numerical
computation, especially for solving the system of linear equations.
An idempotent matrix, A , is one that is equal to its square, that is, A2 A .
All of the idempotent matrices we shall encounter are symmetric, though,
idempotent matrices can be either symmetric or not.
Properties: Let A be a symmetric idempotent matrix,
a. I A is also a symmetric idempotent matrix.
b. All the eigenvalues of A are 0 or 1.
c. rk ( A) tr ( A) .
d. A is nonsingular
A
Example: Prove a. d.
A useful idempotent matrix

I.

Recall the solution for the least squares problem b ( X X ) 1 X y , where


Xn

(n

K ) is of full rank and y is a vector of order n.

We define the projection matrix H as H

X (X X ) 1 X .

a. Show that H is a symmetric idempotent matrix.


b. What is rk ( H ) ?
c. Show that I
d. What is rk ( I

H is a symmetric idempotent matrix.

H)?

Some results concerning eigenvalues


Let A be a symmetric n n matrix with eigenvalues

. The

associated eigenvectors are denoted by x1 , x2 , , x . Let x be a point on the unit


sphere, that is, x x = .
a.
ax x x = 1 obtained when x = x1
b.
in x x =
obtained when x = x
c.

ax

,,

x x =

obtained when x = x

2.6 Calculus and Matrix Algebra


Scalar function f ( x ) of a scalar x
A variable y is a function of another variable x , say, y

f ( x ) , if each value

of x is associated with a single value of y . In this relationship, y and x are


respectively labeled the dependent variable and the independent variable.
Assuming that the function f ( x ) is continuous and differentiable, we obtain the

dy
,
dx

following derivatives: f ( x)

f ( x)

d2y
, and so on.
dx 2

The addition rule, product rule, quotient rule, etc.


The chain rule:
If h ( x )

(g f )x

g ( f ( x )) , then h ( x )

g ( f ( x )) f ( x ) .

Example: If h( x) sin(2 x 2 ecos x ) , what is h ( x ) ? What is h ( x ) ?


Scalar function f (x) of a vector x
We can regard a function y

, xn ) as a scalar-valued function of a

f ( x1 , x2 ,

f (x) .

vector, that is, y

The vector of partial derivatives, or gradient vector, or simply gradient, is

y
f (x)
x

Also we have

x1
x2 .

xn

f (x)
x

x1

x2

xn

For a matrix Am

f ( A)
A

[ aij ] , the derivative is defined as

f ( A)
a11

f ( A)
a12

f ( A)
a1n

f ( A)
a21

f ( A)
a22

f ( A)
a2 n .

f ( A)
am1

f ( A)
am 2

f ( A)
amn

Note: The shape of the derivative is determined by the denominator of the


derivative.
Commonly used derivatives
f (x)
i) f (x) x y , then
y.
x
ii) f ( B )

x By , where the vector x is of order m, y is of order n, the matrix

f ( B)
B

B is an m n matrix, then

xy .

iii) f (x)

x By , then

f (x)
x

By .

iv) f (y)

x By , then

f (x)
y

B x.

v) f (x)

x Ax , where the vector x is of order n, the matrix A is an n n

matrix, then

f (x)
x

( A A )x .

In particular, when A is symmetric,


Example: Let A
vi) f ( A)
that
rule.

1 3
, show that
3 4

log | A | , then

| A|
aij

f ( A)
A

( A 1)

f (x)
x
(x Ax)
x

2 Ax .
2 Ax .

( A ) 1 , which comes from the fact

( 1)i j | Aij | (the cofactor expansion) and from the chain

The vector function f (x) of a vector x

f1 (x)
Let

f (x)

where

is

vector

of

order

n,

then

f m (x)
f1 (x)
x1

f (x)
x

f1 (x)
xn
.

f m (x)
x1

f m (x)
xn

f (x)
is called the Jacobian matrix.
x
The Jacobian determinant is the determinant of the Jacobian matrix if
m n.

More commonly, this matrix

Hessian matrix
If we take the derivative for the gradient

f (x)
, then a second derivative
x

matrix, the Hessian matrix, is computed as


2

f (x)
x12

f (x)
x1 x2

f (x)
x2 x1

f (x)
x22

f (x)
xn x2

f (x)
xn x1

f (x)
x1 xn

f (x)
x2 xn .
f (x)
xn2

In general, H is a square, symmetric matrix. (The symmetry is obtained for


continuous and continuously differentiable functions from Youngs theorem.)
The chain rule:

f1 (x)
Let h (x)

g (f (x)) , where f (x)

, then

f m (x)
h(x)
x

g (f (x))
f

f (x)
.
x

Example 1: Consider the following problem


ax R = a x x Ax, where
3
a = (5 4 ) and A=
.
3
3
5
Example 2: The least squares problem:
Suppose we have y = Xb + e, where y and e are both vectors of order n, b is
a vector of order K, X is thus an n K matrix.
Solve for b = ar in e e = ar in (y Xb) (y Xb).