Matrices Aleatorias

© All Rights Reserved

11 views

Matrices Aleatorias

© All Rights Reserved

- Detection of Malware by Using Support Vector Machine
- Sjr05010202 Mathematical Methods
- COBOL
- Mathe Question
- Tutorial tensorlab
- Boussias 1.docx
- GPTutorial
- ANNEXURE 2
- Java-Lab
- 022401.pdf
- Modal Test and Analysis
- Introduction
- fmcs_noanswers
- AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)
- Face Recognition
- Quantum Defect Theory I.pdf
- 462 Notes
- notes applied machine
- Maths Sheet Final
- Comparacion de Metosodos Lyapunov

You are on page 1of 43

Statistical Inference

Danning Li

dl496@statslab.cam.ac.uk

Contents

1 Introduction 1

1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Large dimensional data analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Types of random matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Limit theorems of eigenvalues of popular random matrices . . . . . . . . . . 10

2.1 Semicircle law for iid case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Moment Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Proof of the Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

variance Matrix 22

3.1 MP Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Stieltjes Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Stieltjes Transform of The MP Law . . . . . . . . . . . . . . . . . . . . . . 26

3.4 The Proof of the Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Likelihood Ratio Test for Sphericity . . . . . . . . . . . . . . . . . . . . . . 32

4.2 Different Tests for Sphericity . . . . . . . . . . . . . . . . . . . . . . . . . . 38

1 Introduction

1.1 Definition

From linear algebra, we have learnt there are a lot of different types of matrices, such as

symmetric matrices, orthogonal matrices, etc. The elements of a matrix can be real or

1

complex. Formally, an n n square matrix can be defined as follows:

a11 a12 . . . a1n

a21 a22 . . . a2n

A := . .

.. .. .. ..

. . .

an1 an2 ann

real case or the conjugate transpose of A if in complex case. If AT A = In , then A is an

orthogonal matrix. For a non-zero vector , if A = , then we say that the number is the

eigenvalue of A corresponding to . The matrix A has n eigenvalues, denoted by 1 , . . . , n .

A random matrix is defined by replacing all the elements of the matrix A with random

variables. Then the corresponding eigenvalues 1 , . . . , n are random variables too. The

main research area about random matrices focuses on investigating the limiting properties of

the eigenvalues of random matrices (known as the spectral properties of random matrices),

when their dimension tends to infinity.

Most of the classical limiting theorems in statistics assume that the dimension of data is

fixed. In recent years, with the development of the modern technology, people are collecting

data to characterize many difference features for the study subject. So the dimension of a

dataset can be huge. Sometimes the dimension of data can be even larger than the sample

size. The gene expression data is a good example. Then it is natural to ask whether the

dimension needs to be considered large and whether there are any differences between the

results for the fixed dimension and those for large dimension. The answer is yes!

Why does the random matrix theory (RMT) focus on spectral analysis of random ma-

trices? One reason is that, in multivariate analysis, many statistics can be expressed as

the functions of eigenvalues. For example, consider the hypothesis test: H0 : = 2 In vs.

H1 : 6= 2 In for n independent observations X1 , . . . , Xn from a Np (, ) distribution.

Pn T

i=1 (Xi X)(Xi X)

The sample covariance matrix is defined as S := n1 and the likelihood

function of (, ) is

n n 1 1 o n 1 o

L(, ) = det()n/2 exp tr S exp n(X )T 1 (X ) . (1.1)

2 2

2

Then the likelihood ratio test statistic is

supRp ,2 >0 L(, 2 In )

:= supRp ,>0 L(,)

on/2

det( n1

n

S)

= 1

n

n

p . (1.2)

p

tr( n1 S)

n Pp

Let 1 , . . . , n be the eigenvalues of n1

n S. Then we can write log = 2 i=1 log i

p

p log( p1 i=1 i ). Research has shown that its limiting distribution is different when p is

P

fixed or divergent along with n. Another example, the largest eigenvalue of S is a very

important statistic in principal component analysis (PCA), and its limiting distribution

depends on the dimension setting too. So the spectral analysis of random matrices is very

important.

A lot of research in the RMT focus on investigating symmetric random matrices. There are

three classical types of such random matrices. They are named differently in physics and

statistics. The following table gives a comparison.

Hermite GOE GUE GSE

Laguerre Real Wishart Complex Wishart Quaternion Wishart

Jacobi Real MANOVA Complex MANOVA Quaternion MANOVA

In the table, GOE denotes the Gaussian orthogonal ensemble, GUE is the short notation

for Gaussian unitary ensemble, GSE is the acronym for Gaussian symplectic ensemble and

MANOVA is the short notation for multivariate analysis of variance. Each row corresponds

to one kind of classical random matrices. The parameter = 1, 2 and 4 correspond to

the real, complex and quaternion cases respectively. We are all very familiar with real

and complex numbers. The quaternion number can be defined in the following way: the

quaternion number q := a1 +a2 i+a3 j +a4 k, where a1 , a2 , a3 , a4 R, i2 = j 2 = k 2 = 1 and

ij = k, jk = i and ki = j. Notice that two quaternion numbers q1 and q2 are generally not

commutative for q1 q2 6= q2 q1 . Let x1 , x2 , x3 and x4 follow a standard normal distribution

independently. Then we can define three standard normal distributions correspond to the

real, complex and quaternion cases repectively:

(2) The standard complex normal distribution, denoted by CN (0, 1), is the probability

distribution of (x1 + ix2 )/ 2.

3

(3) The standard quaternion normal distribution, denoted by QN (0, 1), is the probability

x1 +ix2 +jx3 +kx4

distribution of 2 .

Then the GOE, GUE and GSE appear in the first row of the table, can be defined as

iid

G := 1 (X + X T ), where X = (xij ) is an n n matrix and xij N /CN /QN (0, 1)

2

correspondingly. Hermitian ensemble is another name for this type of matrix in physics. It

is easy to check that each diagonal entry of GOE is normally distributed with mean of 0 and

variance of 2. and that each off-diagonal entry follows the standard normal distribution.

The density function of the eigenvalues (1 , . . . , n ) of G is

Y X n

fG (1 , . . . , n ) = C |i j | exp 2i ,

4

i<j i=1

the normalization constant and (1 , . . . , n ) Rn .

From now on, let C be a generic constant which can vary from line to line.

The real, complex and Quaternion Wishart matrices, which show in the second row of

the table, can be defined as W := X T X, where X = (xij ) is an m n random with

iid

xij N /CN /QN (0, 1) correspondingly and m > n. Physics terminology for this type of

random matrices is Laugerre ensemble. The density function of the eigenvalues (1 , . . . , n )

of W is

n n

Y

Y

(mn+1)1

X

fW (1 , . . . , n ) = C |i j | i 2

exp i ,

4

i<j i=1 i=1

the normalization constant and (1 , . . . , n ) (0, )n .

The real, complex and quaternion MANOVA matrices appear in the last row of the table

are defined as M := X T X(X T X + Y T Y )1 , where X = (xij )m1 n and Y = (yij )m2 n ,

iid

where n < min(m1 , m2 ) and xij , yij N /CN /QN (0, 1). In physics, the MANOVA matrix

is also called Jacobi ensemble. Then the density function of the eigenvalues

n

Y Y (m1 n+1)1

fM (1 , . . . , n ) = C |i j | i2 (1 i ) 2 (m2 n+1)1 ,

i<j i=1

where = 1, 2, 4 correspond to the real, complex and quaternion cases, respectively. Again

C is the normalization constant and (1 , . . . , n ) (0, 1)n .

REMARK 1.1 Once we have the joint distribution of the eigenvalues, we can derive a lot

of probabilistic properties, such as the limiting spectral distribution and the limits of the

4

largest eigenvalues, etc. In all of the curves above, 1 , . . . , n are not independent, but the

dependence is not too strong.

Now lets see how to derive a joint density function of eigenvalues of a special random matrix

p

A in the real case. Define A := (Aij )nn = 1/2G, where G is a GOE defined in above.

So {Aij , i j n} are independent random variables, Aii N (0, 1), Aij N (0, 12 ) and

Aij = Aji .

(1 , 2 , . . . , n ) Rn is

n

Y 1 X 2

fA (1 , . . . , n ) = C |i j | exp i , (1.3)

2

i<j i=1

Before we prove Theorem 1, let us review a very useful lemma (Corollary 2.5.4 from An-

derson, Guionnet and Zeitouni (2010) )

LEMMA 1.1 Let (U1 , U2 , . . . , Un ) denote the eigenvector corresponding to the eigenvalue

(1 , . . . , n ) of the random matrix A. Then the collection of (U1 , U2 , . . . , Un ) are independent

of the eigenvalues (1 , . . . , n ).

Sketch Proof of Lemma 1.1. For any matrix M Rnn , we say M is admissible if the

first non-zero element of each column of M is positive. There exists a function a : Rnn

Rnn such that a(M ) is the admissible form of M . Define D(M ) = diag(h1 , . . . , hn ) is a n

dimensional diagonal matrix with h1 h2 . . . hn are the eigenvalues of M and U (M ) is

a admissible matrix with jth column is the jth largest eigenvector corresponding to the jth

largest eigenvalue of M . By eigenvalue decomposition, we have that A = U (A)D(A)U (A)T .

It is known that, with probability 1, all the eigenvalues of A are distinct. So we have

that D(A) = diag(1 , . . . , n ) with 1 > 2 > . . . > n . Now The transformation A =

U (A)D(A)U (A)T is a bijective map. In order to show that D(A) and U (A) are independent,

we only need to show that the conditional distribution of D(A) given U (A), does not

depend on U (A). In another word, we need to show that the two conditional random

matrix D(A)|(U (A) = U0 ) and D(A)|(U (A) = U1 ) have the same distribution, where U0

and U1 are any two orthogonal admissible matrices and U0 6= U1 . Let H = U1 U0T . Since

D(A) = D(HAH T ), we have that

= D(HAH T )|(U (H(H T AH)H T ) = HU0 ).

5

Let B = (H T AH), = D(B) and P = U (B). Since HBH T = HP (HP )T = a(HP )[a(HP )]T ,

we have that U (H(H T AH)H T ) = a(HU (H T AH)). So

d

= D(HAH T )|U (H T AH) = U0 ) = D(A)|(U (A) = U0 ).

To show the second equation above, we only need to show that the event {a(HU (H T AH)) =

HU0 } is same as the event {U (H T AH) = U0 }. We know that there exists a diagonal ma-

trix S = diag(1) such that a(HU (H T AH)) = HU (H T AH)S, then HU (H T AH)S =

HU0 . Since H is orthogonal and U (HAH T ) and U0 are both admissible, we get that

S = In . ThereforeU (H T AH) = U0 if a(HU (H T AH)) = HU0 . It is easy to check that

d

a(HU (H T AH)) = HU0 if U (H T AH) = U0 . The last equation is due to that A = HAH T .

So we conclude that the eigenvalues of A are independent of its eigenvectors.

n

Y 1 Y

L(Aij , 1 i j n) = C exp A2ii exp(A2ij )

2

i=1 1i<jn

n

1 X X n 1 o

= C exp A2ii A2ij = C exp tr(A2 ) .

2 2

i=1 1i<jn

required in the proof of Lemma 1.1, which make the map from A to (D, U ) is a one to

one map. We can re-write the matrix A as a n2 1 vector vec(A) by stacking up the

columns of A together. In the same way we can also vectorize the matrix D and U into

vec(D) and vec(U ). Recall that the degrees of freedom of a random vector is the number

of free components in a random vector(i.e. how many components need to be known before

the vector is fully determined). Since A is symmetric, we know that the elements in the

lower triangular part are determined by those in the upper triangular part. This means

that if given the entries of on or above diagonal of A, the rest elements of A are fully

n(n+1)

determined. Then we say there are only 2 free components of the random vector

n(n1)

vec(A). It is easy to see that vec(D) has n free components. The vector vec(U ) has 2

free components, because there are n + 12 n(n 1) constraints on the n2 elements of vec(U )

implied by U T U = In ( U = (U1 , . . . , Un ) with constraints that the l2 vector norm kUi k = 1

6

and the inner product (Ui , Uj ) = 0). Now define the following three vectors:

A11

A

22

.

..

U12

Ann U13

..

A12 1

.

.

A

:=

, := . , :=

U1n

. (1.4)

13 .

.

..

U

n

23

A

.

..

1n

A23 U(n1)n n(n1)

.

..

2

A(n1)n n(n+1)

2

We know that all the elements of A, D and U can be expressed as the functions of ,

and respectively. For the sake of easy notation, we let := (1 , . . . , n(n1) ) = . From

2

Lemma 1.1, we know that and are independent. Since tr(A2 ) = tr(U D2 U T ) = tr(D2 ) =

Pn 2

i=1 i , together with (1.4), we know that the joint density of (, ) is

1 X 2

g(, ) = C exp i det . (1.5)

2 (, )

i

Now we only need to calculate the determinant of the Jacobian of the map from to (, ).

Define J := (,) , we have

A11 Ann A12 A(n1)n

1 ... 1 1 ... 1

.. .. .. .. .. ..

. . . . . .

A11 Ann A12 A(n1)n

n ... n n ... n

J = .

A11 Ann A12 A(n1)n

1 ... 1 1 ... 1

.. .. .. .. .. ..

. . . . . .

A(n1)n

A11 Ann A12

1 n(n1) ... 1 n(n1) 1 n(n1) ... 1 n(n1)

2 2 2 2

In order to calculate the determinant of J, we need to review some definitions and useful

facts in linear algebra.

B

(1) For any n n matrix B, Let x denote the derivative of the matrix B with respect to

7

the scalar variable x, which is the n n matrix of element by element derivative:

B11

x . . . B

x

1n

B .. .. ..

= . . . .

x

Bn1 Bnn

x ... x

U T U

U + UT = 0. (1.6)

1 1

U T

Let S (1) := U T 1

= U

1 U.

A U U T

= DU T + U D .

1 1 1

So

A U U T

UT U = UT D+D U

1 1 1

= S (1) D DS (1) .

A

It follows that the element in (i, j)th position of the n n matrix (U T 1

U ) is

X Akl (1)

Uki Ulj = Sij (j i ), (1.7)

1

1k,ln

(1)

where Sij is the (i, j)th element of matrix S (1) . Since Aij = Aji , we know that, the

above equation can be rewritten as

X Akk X Akl (1)

Uki Ukj + 2 Uki Ulj = Sij (j i ). (1.8)

1 1

k k<l

D A

(3) Since D = U T AU , we have that 1 = U T 1

U . Hence the (i, j)th element of the

D

matrix 1 is

= Uki Ukj + 2 Uki Ulj = ij i1 . (1.9)

1 1 1

k k<l

8

Now define ij R(n+1)n/2 and matrix V R(n+1)n/2(n+1)n/2 as follows:

U1i U1j

..

.

U U

ni nj

ij = (1.10)

2U1i U2j

..

.

2Un1i Unj

and V = (11 , 22 , . . . , nn , 12 , 13 , . . . , 1n , 23 , . . . , (n1)n ). Then by (1.8) and (1.9), we get

that

Inn 0 ... 0

(1) (1)

S12 (2 1 ) ... S(n1)n (n n1 )

JV = ,

.. .. .. ..

. . . .

( 1 n(n1)) ( 1 n(n1))

(2 1 ) . . . S(n1)n

S122 2

(n n1 )

Q

where denote some unknown values. So |det(JV )| = i<j |i j |h(1 , . . . , 1 n(n1) ).

2

Since det(V ) is a function of vec(V ), which can be written as a function of too. Further

Q

|det(JV )| = |det(J)||det(V )|, we know |det(J)| = 1i<jn |i j |h(1 , . . . , (n1)n ). Now

2

we have that

1 X Y

g(, ) = C exp 2i |i j |h(1 , . . . , 1 n(n1) ).

2 2

i i<j

Since and are independent, it is easy to obtain the joint density of (1 , . . . , n ) from

g(, ).

Beside the random matrices in the table, there are some other types of random matrices

with interests, such as Toeplitz (aij = a|ij| ), Hankel (aij = ai+j1 ) and Markov matrices

P

(symmetric matrices with the ith diagonal element equals j6=i aij ). All examples above

are symmetric matrices. We may also investigate some properties about non-symmetric

matrices, such as G = (gij ), where gij are iid N /CN (0, 1) and Haar measure on classical

compact groups. In these case, the eigenvalues of the random matrices may not be real any

more.

From a statistical point of view, people care more about the matrix whose entries have

unknown distribution and may depend on each other. However due to the technical limi-

tation, the current RMT does not handle such kind of random matrices very well. Some

results are known for special cases such as the Haar matrices. However, the RMT does

provide some insights about random matrices whose entries follow unknown distribution,

when there exists necessary independence assumption.

9

Some extra notes about the Haar measure :

(2) There exists a unique measure on (O(n), ), where indicate the normal matrix

product, such that

Z Z Z

f (ax)(dx) = f (xb)(dx) = f (x)(dx)

O(n) O(n) O(n)

for any bounded measurable function f: O(n) R and a, b O(n). Then is the

Haar measure on O(n).

(3) There are two ways to generate the Haar measure on O(n).

1

a) Let Y = (yij )nn , where yij are iid N (0, 1) r.v.s. Set X = (Y T Y ) 2 Y . Then X

generates the Haar measure on O(n) and X is also called as Haar orthogonal ensemble.

b) Let Y be same as the above and write Y = (y1 , . . . , yn ). Use the GramSchmidt

process (QR decomposition) on Y: e1 = kyy11 k , w2 = y2 (y2 , e1 )e1 , then e2 = kww2

2k

and

Pi1 wi

so on. Letwi = yi j=1 (yi , ej )ei and ei = kwi k . So X = (e1 , . . . , en ) has the Haar

measure on O(n) and it is an Haar orthogonal ensemble.

are real( e.g., if A is Hermitian), we can define a empirical distribution of the eigenvalues

as follows:

n

1X

FnA (x) = I(i x),

n

i=1

which is called the empirical spectral distribution (ESD) of matrix A. To investigate the

convergence of FnAn when dimension n goes to infinity is one of main problems in the RMT.

The limiting distribution F of FnA is called the limiting spectral distribution(LSD) of A.

The importance of ESD is due to the fact that many important statistics can be expressed

R

as a function of ESD, for example, det(A) = ni=1 i = exp n 0 log(x)FA (dx) . We have

Q

the following famous result regarding the LSD for some well known random matrices:

(1) Semicircle law: the ESD of the normalized GOE (divided by n) converges to the

semicircle law with probability 1. The density function of the semicircle law is

(

1

2 if |x| 2;

2 4 x

f (x) =

0 otherwise.

10

0.30

0.25

0.20

0.15

y

0.10

0.05

0.00

-2 -1 0 1 2

The GOE is also called the Wigner matrix, which is named after the mathematical

physics professor Eugene Wigner, who first proved the LSD of GOE follows the semi-

circle law. The above result is also held true for the generalized Wigner matrix which

only requires the matrix to be a symmetric matrix with on or above diagonal entries

are independent and satisfying some Lindebergs condition.

(2) The largest eigenvalue and the smallest eigenvalue of the Wigner matrix have limits

results as follows: max

n

2 and min

n

2 if the entries of Wigner matrix have

mean of 0, variance of 1 and the fourth moment exists.

Pn T

i=1 (Xi X) (Xi X)

(3) MarcenkoPastur law: the ESD of the sample covariance matrix Sn := n1 ,

where Xi = (xi1 , x12 , . . . , xip ) and xij are i.i.d with mean of 0 and variance of 1, con-

verges to the MP law which has density function as

(

1

p

2xy (b x)(x a) if a x b;

f (x) =

0 otherwise.

and has a point mass 11/y at the origin if y > 1, where a = (1 y)2 , b = (1+ y)2

and p/n y (0, ). The asymptotic theory of spectral analysis of the sample

covariance matrix was developed by Marcenko and Pastur (1967) and Pastur (1972,

1973). This result was later generalized to non-Gaussian ensembles.

x<-seq((1-sqrt(p/n))^2,(1+sqrt(p/n))^2,0.01),

y<-dmp(x,n,p,1) > ,plot(x,y,type="l")

11

0.8

0.6

y

0.4

0.2

0.0

know there are p n eigenvalues equals 0, so the LSD has a point mass at zero in

this case. This LSD also indicates a very important fact hat the sample covariance

matrix is no longer a good estimator of the covariance matrix when the dimension is

very high.

(4) Limits for spectral norm of sample covariance matrices: max (Sn ) (1 + y)2 if and

only if the fourth moments of xij exists.

i.i.d

(5) TracyWidom Law: Assume that X is an n p random matrix with xij N (0, 1),

where n/p y (0, ). Denote n = ( n + p)2 and n = ( n + p)( 1n +

T

1 )1/3 .

p Then max (XnX)n F1 , while F1 follows TracyWidom law, where F1 (s) =

R

exp( 21 2

s (q(x) + (x s)q (x))dx) and q(x) satisfies the Painleve function

(

q(x) = xq(x) + 2q 3 (x);

q(x) Ai(x), x .

This result was derived by Iain M. Johnstone (2001). In fact the first work was done

for the limiting distribution of the largest eigenvalue of Gaussian symmetric matrix by

Tracy and Widom (1996). Recently universality for the sample covariance matrix has

been established by two different methods. So the TracyWidom law is generalized

12

to non-Gaussian ensembles with some moments assumptions. The following picture

is the density of TracyWidom law:

0.30

0.25

0.20

y

0.15

0.10

0.05

0.00

-4 -2 0 2 4

(6) As mentioned above, the linear spectral statistics (such as the LRT statistic)are im-

R

portant in multivariate analysis, such as the LRT statistic. We define f (x)Fn (dx)

and call it as the linear spectral statistics (LSS),where Fn (x) is the ESD of a random

R

matrix. We can use (LSS) to estimate a parameter = f (x)F (dx). By Bai and

Silverstein (2004) showed that the limiting distribution of

Z

Xn (f ) := an f (t)d(Fn (t) F (t))

(7) Girkos circular law: let G = (gij )nn , gij are i.i.d. N /CN (0, 1), then the corresponding

eigenvalues 1 , . . . , n are complex random variables. Then the empirical spectral

distribution FnG (x, y) = n1 #{i n : Re(i ) x, Im(i ) y} almost surely converges

to the uniform distribution on the unit disk in the complex plane with density f (z) =

I(|z| 1). The best results are given by Tao and Vu (2010) by only assuming gij

i.i.d. with mean of 0 and variance of 1.

13

2 Wigner Semicircle Law by Moment Method

Under a generalized definition, a Wigner matrix is a symmetric matrix whose entries on

or above the diagonal are independent. The Wigner matrix includes the GO/U/S E. The

research about the Wigner matrix arose in the nuclear physics in 1950s. Wigner (1955,

1958) showed that the expected ESD of a Gaussian ensemble converges to the semicircle

law. This work was generalized by many researchers in various aspects.

Suppose that Xn is an n n Wigner matrix whose diagonal entries are i.i.d. random

variables with mean 0 and variance 2 and those above the diagonal entries are i.i.d. random

variables with mean 0 and variance 1. Let Wn = 1 Xn denote the normalized Wigner

n

matrix. Suppose 1 , . . . , n are the eigenvalues of Wn , we can write the empirical spectral

distribution (ESD) of Wn as

n

1X

FWn (x) = I(i x).

n

i=1

n

semicircle distribution, whose density is

(

1

2 if |x| 2;

2 4 x

f (x) =

0 otherwise.

In order to apply the moment method to prove the Theorem 2, we need to review some useful

facts about the moment convergence theorem(MCT). The MCT gives the conditions under

which the convergence of moment of all fixed orders implies the convergence in distribution.

Let {Yn } denote a sequence of random variables with finite moments of all orders. Since

1 C

sup P (|Yn | K) 2

E|Yn |2 2 ,

n K K

then {Yn } is tight. Let Ynl and Ynk be any two subsequence of Yn . Then we can find

two convergent subsequence of them. To simplify the notation, we can call them Yn1 and

Yn2 . Suppose Yn1 Y1 and Yn2 Y2 . From the moment convergence condition, we know

that EY1k = EY2k for all k N . Let Mk = EY1k . Since Yn is tight, we know that if any

14

convergent sub sequence of Yn converges to the same limit Y in distribution, then Yn Y .

Now we need some conditions to guarantee that Y1 = Y2 . In another words, we need some

conditions under which the distribution of a random variable is uniquely determined by its

moments. In fact, Carleman condition or Rietz condition can guarantee the uniqueness of

the distribution by its moments. They are:

1

Rietz condition: lim inf k k1 M2k

2k

< ;

1

P 2k

Carleman condition: k=1 M2k = .

1

P

(2) Show that the Carelman condition k M2k 2k = is held;

R k dF 1 1

ki = tr(Xnk )

P

(3) Prove that x Wn (x) = n nk/2+1

Mk .

Proof of (1): Obviously, we have M2k+1 = 0, because the semicircle density f is a sym-

metric function. Now

2

1 p 1 2 2k p

Z Z

2k

2

M2k = x 4 x dx = x 4 x2 dx

2 2 0

1 1 22k+1 (k + 12 )( 23 )

Z

1 1

= (4y)k (1 y) 2 y 2 dy = .

0 (k + 2)

(2k)! 1 2k

By (x + 1) = x(x) and ( 21 ) = , we have M2k = (k+1)!(k)!

= k+1 k for all k N . In

1 2k

fact k+1 k is called Catalan numbers in combinatorial mathematics.

1 1

Proof of (2): Since M2k (2k)! (2k)2k , then M2k 2k 2k. Therefore k M2k 2k = .

P

Sketch proof of (3): In this sketch proof, lets further assume that all the entries of Xn

1

are bounded above. In order to show k tr(Xnk ) Mk , we only need to show

n 2 +1

1

(a) k Etr(Xnk ) Mk ;

n 2 +1

1

tr(Xnk )) < .

P

(b) V ar( k

n 2 +1

15

At first, we review some concepts in graph theory. A graph is an ordered pair = (V, E),

where V is the vertex set and E is the edge set. In a directed graph, an edge is an ordered

pair (v1 , v2 ), which starts from the vertex v1 and ends at the vertex v2 . if v1 = v2 , we

call this type of edge a loop. If two edges have the same set of vertices, we say they are

coincident. A cycle is define as a sequence of vertices that starts and ends at the same

vertex. Any two consecutive vertices in the cycle are connected by an edge in the graph.

Now, we can expand tr(Xnk ) as

X

tr(Xnk ) = xi1 i2 xi2 i3 . . . xik1 ik xik i1 , (2.11)

sums over (i1 , . . . , ik ) {1, 2, . . . , n}k . Let Gi1 ...ik = xi1 i2 xi2 i3 . . . xik1 ik xik i1 .

P

where

With the set of subindex (i1 , . . . , ik ) of Gi1 ...ik , we can define a graph (k, t) as follows:

plot a horizontal line and mark the numbers i1 , . . . , ik on it. Suppose there are t distinctive

number in this set. Now the t distinct numbers become the vertices of the graph, and draw

k edges from i1 to i2 , i2 to i3 , . . ., ik1 to ik and ik to i1 . For example, let k = 6, t = 4 and

Gi1 ...i6 = x12 x23 x33 x34 x42 x21 , we can draw the graph in figure 1. Obviously, any such kind

i3 = i4

i1 = i7 i5

i2 = i6

Figure 1: (6, 4)

graph (k, t) defined above forms a cycle. Each term Gi1 ,...,ik in above equation (2.11) can

be also written as Gi1 ,...,ik = xks11h1 . . . xksllhl , where lj=1 kj = k and l is the number of non-

P

coincident edges in the graph k,t . All these graphs can be classified into three categories.

Category 1: There exists at least one edge in the graph that is not coincident with all the

other edges. Therefore there is a kj0 = 1 for some j0 . Then EGi1 ...k = 0.

Category 2: There is no loop and each edge in the graph is coincident with exactly one

other edge but at opposite direction. Therefore kj = 2 for for 1 j l and is 6= js .

Then k must be and even number. Hence EGi1 ...k = 1.

Category 3: All the other types of graphs except those in category 1 and 2. So there

are two possibilities: either at least 3 edges are coincident (kj0 3 for some j0 ) or

16

there exists a cycle of non-coincident edges. In the first situation, there are at most

(k 1)/2 consecutive connected non-coincident edges in the graph. Then we have

t (k + 1)/2. In the second situation, the cycle of non-coincident edges implies that

there exists one edge in the cycle whose vertices are decided by the rest of the edges

in the cycle. So we have t (k + 1)/2 again. Since all the entries xij are bounded, so

P k+1

Category3 EGi1 ...k Cn

2 .

From above, we know that all contributions from category 1 and 3 to (a) converges to 0 as

n . Therefore we only need to count how many terms in Category 2. Now if we define

that H(e1 ) = 1 if xi1 i2 (corresponding to edge e1 ) shows up the first time in the graph and

H(es ) = 1 if xis is+1 is same as xi1 i2 and the edge es coincides with the edge e1 but in the

opposite direction. Since k is even number, we can write that k = 2K. Then there exists a

sequence (a1 = H(e1 ) = 1, . . . , a2K = H(e2K ) = 1) that corresponds to the graph . We

call it the characteristic sequence of the graph . It is easy to tell that all the characteristic

sequences of the graphs in category 2 satisfy that

a1 + a2 + . . . + al 0. (2.12)

n

for all 1 l 2K. There are t (n t)! versions of such graph in category 2 corresponding

to one characteristic sequence. Now we only need to count the number of characteristic

sequences.

2K

First, there are K different ways to arrange K positive ones and K minus ones. Now

i

S = 1

-2

Given a sequence (a1 , . . . , a2K ), where ai {1, 1}, let S0 = 0 and Si = Si1 + ai for

i = 1, 2, , 2K. We can represent such sequence into a 2-dimensional graph (i, Si ) on the

plane. Notice that each positive one indicates an upward step and each minus one indicates

17

a downward step (see figure above). The graph should start from (0, 0) and return to (2K,

0) after 2K Steps.

If (a1 , . . . , a2K ) is a characteristic sequence, then Si 0 for all i according to (2.12). In

other words, the graph should always be above or on the horizontal axis (never touch the

negative part). Therefore, for a non-characteristic sequence, there must exist an i0 such as

Si0 = 1. Or visually, the graph will touch -1 after i0 steps. For such non-characteristic

sequence, we can reflect its graph after the step i0 along the horizontal line of 1 (see the

figure above). The reflected graph will start at 0, but ends at -2 after 2K steps. Now we

define bj = aj for j i0 and bj = aj for all j > i0 , then the new figure corresponds to the

new sequence (b1 , . . . , b2K ). The sequence (b1 , . . . , b2K ) contains K 1 positive ones and

2K

K + 1 minus ones. Therefore the number of the b-sequence is K1 . So the number of

characteristic sequences is

2K 2K 1 2K

= = M2K .

K K 1 K +1 K

P

So Category2 EGi1 ...k = n(n 1) . . . (n K)M2K . Then we can conclude that (a) is right.

To prove (b), we need to show that V ar(tr(Xnk )) is summable for any fixed k. We have

that

1 1 X

tr(Xnk )) =

V ar( k E(Gi1 ...ik Gj1 ...jk ) E(Gi1 ...ik )E(Gj1 ...jk ) . (2.13)

n 2 +1 nk+2

We can construct two graphs 1 and 2 for Gi1 ...ik and Gj1 ...jk . If there are no coincident

edge between 1 and 2 , then Gi1 ...ik and Gj1 ...jk are independent and thus the correspond-

ing term in the sum is 0. If the combined graph = 1 2 has a single edge then

EGi1 ...ik Gj1 ...jk = EGi1 ...ik EGj1 ...jk = 0, hence the corresponding term in (2.15) is 0 too.

Now, suppose that contains no single edge and the graph of non-coincident edges has

a cycle. Then the distinct number of vertices will be no more than k. If contains no single

edge and the graph of non-coincident edges has no cycle, then in each graph, all the cycles

should be contracted the coincident edges. In addition there is also no single edges in each

graph. Hence there is at least one edge in with at least 4 coincident edges and thus the

number of distinct vertices is no larger than k. Consequently,

1

V ar( k tr(Xnk )) Cn2 . (2.14)

+1

n 2

All of the results above are based on the assumption that all the entries xij are bounded.

But in Theorem 2 we only assume i.i.d. with mean 0 and finite variance. So we need some

extra steps. That is to truncate, centralize and rescale the entries of Xn . We can prove that

the ESD of the new matrix is asymptotically the same as the original one almost surely.

18

First lets define the Levy metric.

DEFINITION 2.1 The Levy distance L between two distribution functions F and G is

defined by

Step 1.Truncation

For any fixed positive constant C, we can truncate the random variables xij at C by

defining xij (C) = xij I(|xij | C). Now we define a truncated Wigner matrix Xn(C) whose

elements are all xij (C). Then we have the following result.

n

n

almost surely.

LEMMA 2.1 Let A and B be two n n symmetric matrices with their ESDs denoted by

F A and F B , respectively. Then,

1

L3 (F A , F B ) tr[(A B)(A B)T ]. (2.16)

n

Sketch Proof of Lemma 2.1. First, we show

n

3 A B 1X

L (F , F ) |i (A) i (B)|2 . (2.17)

n

i=1

1 Pn 2

The above inequality is trivial if h = n i=1 |i (A) i (B)| 1. When h < 1, let = h1/3

and m = #(M1 \M2 ) where M1 = {i n : i (A) x} and M2 = {i n : i (B) x + }.

Then we have that

n

A B m 1 X

F (x) F (x + ) 2 |i (A) i (B)|2 = .

n n

i=1

Let = ((1), . . . , (n)) denote a permutation of 1, 2, . . . , n. Then

n

3 A B 1X

L (F , F ) min |i (A) (i) (B)|2 . (2.18)

n

i=1

19

1 Pn

By the perturbation inequality, we know min n i=1 |i (A) (i) (B)|

2 n1 tr(A B)(A

B)T . Therefore, the conclusion is held.

Proof of Proposition 2.1. By Lemma 2.1 and the law of large number, we have

1 X 2

L3 (F Wn , F Wn(C) ) xij I(|xij | > C))

n2

ij

Note that the right hand side of (2.19) can be made arbitrarily small by increasing C. Then

the proposition is held.

Therefore, in the proof of Theorem 2, we can assume that the entries of the matrix Xn are

uniformly bounded.

Step 2. Removing the diagonal

Notice that in Theorem 2, we assume that the diagonal entries and the off diagonal

entries may have different distributions. We would like to set the diagonal entires of Wn(C)

to be 0. By some similar arguments as in step 1, we can show that this modification wont

influence the asymptotic result of the ESD. We still use Wn(C) to denote this new random

matrix.

Step 3. Centralization

In this step, first we reset the mean values of all off diagonal entries of Wn(C) to be zero.

Let a = Ex12 I(|x12 | > C), we will show that

0 1

kF Wn(C) F Wn(C) a11 k , (2.20)

n

where kf (x)k = supx |f (x)|. In fact, we have that

LEMMA 2.2 Let A and B be two n n symmetric matrices with their ESDs denoted by

F A and F B , respectively. Then,

1

kF A F B k rank(A B), (2.21)

n

where kf (x)k = supx |f (x)|.

Proof of Lemma 2.2. Since both sides of (2.21) are invariant under common orthogonal

transformation

" # on A and B, we may transform A and B such that A B has thos form

C 0

, where C is a full rank matrix. To prove (2.21), we assume that

0 0

20

" # " #

A11 A12 B11 A12

A= and B = ,

A21 A22 A21 A22

where A22 is an (n k) (n k) matrix and rank(A B)=rank(A11 B11 ) = k. Denote

the eigenvalues of A, B and A22 by 1 . . . n , 1 . . . n and 1 . . . nk ,

respectively. By the interlacing theorem, we have the relation that max(j , j ) j

min(j+k , j+k ). Then we can conclude that for any x (j1 , j ),

j1 j+k

F A (x), F B (x) < ,

n n

which implies (2.21).

Notice the rank of a110 is 1, then applying Lemma 2.2, we have (2.20). Although the new

matrix Wn(C) a110 has the mean of its off-diagonal elements equal to 0, but its diagonal

entries all are equal to a. Now we can use the following lemma to remove all diagonal

elements of Wn(C) a110 without influencing the final result.

LEMMA 2.3 Let A and B be two n n Hermitian matrices with their ESDs denoted by

F A and F B , respectively. Then,

Proof of Lemma 2.3. It is easy to see that L(F A , F B ) maxk |k (A) k (B)|. we only

need to show that maxk |k (A) k (B)| kA Bk, where k () denotes the k-th largest

eigenvalue. By the definition of k (A), we have that

(

miny1 ,...,yk1 maxxy1 ,...,yk1 x Bx + kA Bk

k (A) = min max x Ax

y1 ,...,yk1 xy1 ,...,yk1 ,kxk=1 miny1 ,...,yk1 maxxy1 ,...,yk1 x Bx kA Bk.

0

Hence, we have L(F Wn(C) a11 , F Wn(C) EWn(C) ) a 0 as n .

Step 4. Rescaling

Now we can normalize the elements of the new matrix Wn(C) EWn(C) . Write 2 (C) =

V ar(x12 I(|x12 | > C)) and define Wn = 1 (C)(Wn(C) EWn(C) ). Note that all the off

diagonal entries of Wn are xij = 1 (C)(x12C Ex12C ). Applying Lemma 2.1, we obtain

(1 (C))2 X

L3 (F Wn , F Wn(C) EWn(C) ) {xij (C) Exij (C)}2

n2 (C)2

i6=j

2

(1 (C)) .

21

Note that (1 (C))2 can be made arbitrarily small by increasing C.

Finally Combining all the results above, we can assume all the entries of Xn are bounded

by C and having mean zero and variance 1. Then applying the sketch proof of (3), we finish

the proof of Theorem 2.

for Sample Covariance Matrix

Sample covariance matrix is very popular in multivariate statistical analysis. This is because

many test statistics can be written as a function of its eigenvalues. The formal definition

of a sample covariance matrix goes as follows. Suppose xij are iid random variables with

mean 0 and variance 2 . Write Xi = (xi1 , . . . , xip )T and X = (X1 , . . . , Xn )np . Then the

Pn T

i=1 (Xi X)(Xi X)

P

Xi

sample covariance matrix is defined as S = n1 , where X = n . There is

another version of sample covariance matrix, which is convenient for the spectral analysis of

random matrices. Notice that the X X T is a rank one matrix and hence the removal of X X T

does not affect LSD by the rank inequality. Therefore the sample covariance matrix can be

simply defined as S = n1 Xk XkT = n1 XX T . In addition, we may assume np y (0, ).

P

The first success in finding the limiting spectral distribution of the large sample covariance

matrix Sn is due to Marcenko and Pastur (1967). The density function of MarcenkoPastur

(MP) Law Fy (x) has a density function

(

1

p

2xy 2 (b x)(x a) if a x b;

f (x) =

0 otherwise

and has a point mass 1 1/y at 0 if y > 1, where a = 2 (1 y)2 and b = 2 (1 + y)2 .

Here, the constant y is the limit of p/n.

3.1 MP Law

In this chapter, we only consider the LSD of the sample covariance matrix for case where the

underlying variables are i.i.d. and bounded with mean 0 and variance 1. We can relax those

assumptions by applying truncation, centralization and rescaling technique as we discussed

in the previous chapter.

THEOREM 3 Suppose xij are i.i.d random variables, which are bounded with mean of 0

and variance of 2 . Assume that p/n y (0, ). Thenwith probability one, the ESD of

S, F S (x) = p1 pi=1 I(i x), converges to the Marcenko-Pastur law.

P

22

We will use the Stieltjes transform to prove Theorem 3. In recent years, the Stieltjes

transform becomes more popular in random matrix research than the moment method. Let

us start with the basic theory and concepts related to the Stieltjes transform.

Z

1

SF (z) = dF (x),

x z

1 1

The Stieltjes transform SF (z) is well defined because the integrand | xz |= 1 .

(x)2 + 2

First, We like to show that we can recover a distribution function F from its Stieltjes

transform. Here is the inversion formula.

LEMMA 3.1 Let a < b be the continuous points of a CDF F (x). Then

b

1

Z

F (b) F (a) = lim ImSF (x + i)dx,

0+ a

Proof of Lemma 3.1 By the definition

1 y x + i

Z Z

SF (x + i) = dF (y) = 2 2

dF (y).

R y x i R (y x) +

Thus

b b

Z Z Z

ImSF (x + i)dx = { 2 + 2

dx}dF (y)

a (y x)

ZR a

by ay

= {arctan( ) arctan( )}dF (y),

R

in which 0,

by ay

arctan( ) arctan( ) (Ia (x) + Ib (x)) + I(a,b) (x).

2

Since a and b are continuous points of F , by the dominated convergence theorem, we have

the conclusion.

From the above lemma, we immediately have the following result.

LEMMA 3.2 For any two CDFs F (x) and G(x), we can say that F = G if and only if

SF (z) = SG (z) for all z D.

23

This lemma indicates that the distribution function can be uniquely determined by its

Stieltjes transform. Here is another result from Lemma 3.1.

LEMMA 3.3 Let F (x) be a CDF and x0 R. Suppose limzDx0 ImSF (z) exists and

denote it by ImSF (x0 ). Then F (x) is differentiable at x0 and F 0 (x0 ) = 1 ImSF (x0 ).

This Lemma says that we can derive a density function of distribution from its Stieltjes

transform.

Proof of Lemma 3.3. Remember that the discontinuous point of F (x) is countable.

Thus for any sequence of continuous points {xn } such that xn x0 as n , we have

that

F (xm ) F (xn ) 1

ImSF (x0 ), (3.23)

xm xn

when n, m . In particular, take x1 < x3 < . . . x0 and x2 > x4 < . . . x0 . Then (3.23)

indicate that F (x2k ) F (x0 ), F (x0 ) F (x2k1 ) F (x2k ) F (x2k1 ) 0 Thus F (x) is

continuous at x0 . Therefor by choose the sequence {x1 , x0 , x2 , x0 , . . .}, we have

F (x) F (x0 ) 1

ImSF (x0 ),

xm x0

for any sequence {xn } with xn x0 where {xn } are the continuous points of F (x).

Now we need to show that F+0 (x0 ) = F0 (x0 ) = 1

ImSF (x0 ). For any sequence xn x0 ,

take a continuous point x0n such that xn > x0n > x0 and |xn x0n | 1

n |xn x0 |. Then we

x0n x0

have x0n x0 and xn x0 1. Then we have

F (xn ) F (x0 ) 1

lim inf ImSF (x0 ),

n xn x0

so F+0 (x) = 1 ImSF (x0 ). By the similar idea, we get the conclusion.

Remind that the characteristic function is a powerful tool of studying weak convergence.

This is because we have the continuity theorem that provides the sufficient and necessary

conditions for weak convergence. We have a similar continuity theorem for the Stieltjes

transform.

LEMMA 3.4 A sequence of distribution functions {Fn } converges weakly to F (x) if and

only if lim SFn (z) = SF (z) for all z D.

Proof of Lemma 3.4. The necessity part follows directly from applying the dominated

convergence theorem. Now lets deal with the sufficiency part. Write f (x) = (x2 + 2 )

.

24

The function fv (x) is the density function of a Cauchy random variable C with the scale

parameter . Then by the definition of the Stieltjes transform, we have

1 1

Z

ImSFn ( + i) = 2 2

dFn (x).

R (x ) +

1

This implies ImSFn ( + i), if viewed as a function of , is the density function of the

convolution of Fn (x) and the distribution of C . Now define n (t), Fn (t) and C (t)

1

be the characteristic functions of ImSFn ( + i), Fn (x) and C , respectively. Further

1

assume that (t) and F are the characteristic functions of ImSF ( + i) and F (x).

Then n (t) = Fn (t)C (t) and (t) = F (t)C (t). Since lim SFn (z) = S(z), then

1 1

lim ImSFn (z) = ImSF (z). Hence n (t) (t). By C (t) = exp(|t|), we have

Fn (t) F (t). By the continuity theorem of characteristic function, we conclude that

Fn F as n .

If for any distribution sequence of Fn , we know SFn (z) converges to some limit s(z).

There is still no guarantee that s(z) is the Stieljtes transform of some probability distribution

function F . This problem can be solved by assuming that Fn is a tight sequence.

LEMMA 3.5 Let {Fn } be tight for n 1. If lim SFn (z) = S(z) for all z D, then there

is a CDF F (x) such that Fn converges to F weakly, and SF (z) = S(z).

Proof of Lemma 3.6. Since lim SFn (z) = S(z), then all the convergence subsequence Fnk

have the same limiting distribution. By Hellys theorem, Fn is tight implies that Fn F

and SF (z) = S(z).

If we can not check the tight condition, then the following lemma is very useful.

LEMMA 3.6 If lim SFn (z) = S(z) for all z D, then there exists a probability distribution

function F with Stieltjes transform SF (z) if and only if

Now we can conclude a criterion for Stieltjes transforms. Let S(z) be a analytic function

on D. Then there exists a distribution function F with Stieltjes transform S(z) if and only

S(z) satisfies that ImS(z) > 0 for each z D and (3.24) holds.

25

3.3 Stieltjes Transform of The MP Law

First lets review some basic definitions and theorems from the complex analysis.

f (z)f (z0 )

point z0 in its domain is defined by the limit f 0 (z0 ) = limzz0 zz0 . If the limit exists,

we say that f is complex differentiable at the point z0 .

This is the same as the definition of the derivative for real functions, except that all of

the quantities are complex.

that f is holomorphic on U .

it implies that any holomorphic function is actually infinitely differentiable.

a holomorphic function over its domain. If there exists a holomorphic function g : U C

g(z)

and a positive integer n, such that for all z in U \a, f (z) = (za)n holds, then a is called a

pole of f. The smallest n is called the order of the pole. A pole of order 1 is called a simple

pole.

on U \{a1 , . . . , am }. If is a simple closed curve with counter-clockwise direction in U,

H P

which does not meet any ai , then f (z) dz = 2i Res(f, ak ) with the sum over those k

for which ak is in the interior of . Here, Res(f, ak ) denotes the residue of f at ak and

1 0

H

Res(f, ak ) = 2i 0 f (z) dz, where is a counter clockwise oriented closed curve in U and

only contains ak .

LEMMA 3.8 The Cauchys integral formula: Suppose U is an open subset of the com-

plex plane C, f : U C is a holomorphic function and the closed disk D = {z :

|z z0 | r} is completely contained in U . Let be the circle forming the boundary of

1

H f (z)

D. Then for every a in the interior of D, f (a) = 2i za dz, where the contour in-

tegral is taken counter-clockwise. In particular f is actually infinitely differentiable, with

n! f (z)

f (n) (a) = 2i

H

(za)n+1 dz.

In the following calculation, we specify the square root of a complex number as the one

with positive imaginary part. Let z = + i. By this requirement, we have that Re( z) =

26

1 sign() 1

p p

2

|z| + u and Im( z) = 2

|z| u. This shows that the real part of z has the

same sign as that of the imaginary part of z.

Now let us calculate the the Stieltjes transform S(z) for the MP law, where z D.

When y < 1, we have

b

1 1 p

Z

S(z) = 2

(b x)(x a)dx,

a x z 2xy

where a = 2 (1 y)2 and b = 2 (1 + y)2 . Let x = 2 (1 + y + 2 y cos ) and = ei .

Then we have

2 1

Z

S(z) = 2 sin2 d

0 (1 + y + 2 ycos)( (1 + y + 2 ycosw) z)

1 2

((ei ei )/2i)2

Z

= d

0 (1 + y + y(ei + ei ))( 2 (1 + y + y(ei + ei )) z)

1 ( 1 )2

I

= d

4i ||=1 (1 + y + y( + 1 ))( 2 (1 + y + y( + 1 )) z)

1 ( 2 1)2

I

= d.

4i ||=1 ((1 + y) + y( 2 + 1))( 2 ((1 + y) + y( 2 + 1)) z)

2 y and 3,4 =

2 (1+y)+z 4 (1y)2 2 2 (1+y)z+z 2

2 2 y

. By Cauchy integral formula, we find that residues at

these 5 poles are

1 1y 1 p 4

2

, and 2 (1 y)2 2 2 (1 + y)z + z 2 .

y yz yz

Notice that |1 2 | = 1 and |3 4 | = 1. So only one pole in each pair is inside the contour

|| = 1. It is easy to check that |1 | < 1, |2 | > 1 when |y| 1. Since we require the square

root of a complex number as the one with positive imaginary part, this indicates that the

real part of z has the same sign as that of the imaginary part of z, then we know that the

p

both the real part and imaginary part of 4 (1 y)2 2 2 (1 + y)z + z 2 and 2 (1+y)+z

have the same signs. Hence |3 | > 1 and |4 | < 1. Then by Residue Theorem, we obtain

p

2 (1 y) z + (z 2 y 2 )2 4y 4

S(z) = . (3.25)

2yz 2

When y > 1, S(z) is equal to the above integral plus (y 1)/yz. With some basic

calculation, the equation (3.25) still holds true. When y = 1, the equation (3.25) is true by

continuity. Without loss of generality, we can assume 2 = 1 and y < 1 from now on. It is

not very hard to see that S(z) is one of the solutions for

1

S(z) = , (3.26)

1 z y yzS(z)

whose imaginary part is positive.

27

3.4 The Proof of the Theorem 3

p tr(Sn zIp )1 , z D. We need the

following three steps to prove Theorem 3:

(1) For any fixed z D, STn (z) ESTn (z) 0 almost surely.

have

n

1X

STn (z) ESTn (z) = {Ek1 tr((Sn zIp )1 ) Ek tr((Sn zIp )1 )}

p

k=1

n

1X

= (Ek Ek1 ){tr((Sn zIp )1 ) tr((Snk zIp )1 )},

p

k=1

where Snk = Sn n1 Xk XkT . The last equation is due to the fact that Ek tr((Snk zIp )1 ) =

Ek1 tr((Snk zIp )1 ). For any invertible matrix A and two vectors and , we have

A1 T A1

(A + T )1 = A1 .

1 + T A1

By the above formula, we obtain

1 T 2

n Xk ((Snk zIp ) )Xk

tr((Sn zIp )1 ) tr((Snk ZIp )1 ) = =: Hn . (3.27)

1 + n1 XkT ((Snk ZIp )1 )Xk

DDT = DT D = Ip . So

Xp

= |(DXk )i |2 (|i + i|2 )1

i=1

p

X

= |(DXk )i |2 {(i )2 + ()2 }1

i=1

= XkT ((Snk Ip )2 + 2 Ip )1 Xk .

XkT ((Snk Ip )2 + 2 Ip )1 Xk 1

|Hn | T 1

= .

Im(n + Xk ((Snk zIp ) )Xk )

28

Let Rk = (Ek1 Ek )Hn . If we define Fk = (Xn , Xn1 , . . . , Xnk+1 ) and Yk = Rnk+1 ,

then (Yk , Fk ) is a bounded martingale difference sequence. By the moment inequality for

the martingale difference, we have

n

C4 X

E|STn (z) ESTn (z)|4 = E( |Rk |2 )2 = O(n2 ).

p4

k=1

Step 2. Now we will show ESTn (z) S(z), where S(z) is defined in (3.25) with 2 = 1.

1T

.

Recall X = (X1 , . . . , Xn ) = .

. , and let X(l) denote the matrix obtained by removing

pT

!

1 T 1 T T

1 1 1 X(1)

1

the l-th row of X. Then Sn = n XX T = 1 n

1

n

and (Sn zIp )1 =

X(1) X(1)X(1) T

n 1 n

!1 !

T z T X(1) T d d

1 1 1 11 1

1

n . For any D = , if we write D1 = (bij ),

X(1)1 X(1)X(1) zIp1 T T

d1 D22

then b11 = d d 1D dT . This formula can be generalized to the other elements on the

11 1 22 1

diagonal of A1 . So we have

p

1 1X 1

STn (z) = tr(Sn zIp )1 = 1 T 1 T .

p p z

l=1 n l l

X(l)T ( n1 X(l)X(l)T

n2 l

zIp1 )1 X(l)l

p

Define yn = n and l = 1 T

n l l 1 1 T

X(l)T ( n1 X(l)X(l)T

n2 l

zIp1 )1 X(l)l + yn +

yn zESTn (z). Then, we have

1

ESTn (z) = + n , (3.28)

1 z yn yn zESTn (z)

where

p

1X l

n = E .

p (1 z yn yn zESTn (z))(1 z yn yn zESTn (z) + l )

l=1

1 p

s1,2 (z) = (1 z yn + yn zn ) (1 z yn yn zn )2 4yn z .

2yn z

(1 yn n ), which is negative. So s1 (z) 0 as . This shows that ESTn (z) = s1 (z)

for all z with large imaginary part. If ESTn (z) = s1 (z) is not true for all z D, then

29

there exists z0 = 0 + i0 D such that ESTn (z0 ) = s1 (z0 ) = s2 (z0 ). Then ESTn (z0 ) =

(1z0 yn +yn z0 n )

2yn z0 . Combine this with (3.28), we have

1 z0 yn 1

ESTn (z0 ) = + .

yn z0 z0 1 + yn + yn z0 ESTn (z0 )

(yn yn ) 2

0 yn

It is easy to see that Im( 1z 2 (2 + 2 ) < 0 when yn < 1. Furthermore, Im(z0 1 +

yn z 0 ) = yn

R 0 0

yx

yn + yn z0 ESTn (z0 )) = 0 (1 + 0 (x )2 + 2 dFn (x)) > 0, where Fn (x) is the ESD of Sn . So

0 0

we conclude that Im(ESTn (z0 )) < 0, which is a contradiction. So ESTn (z) = s1 (z) for all

z D. If n 0 as n , we conclude that ESTn (z) S(z) for all z D.

Now lets write

p

1X 2l

n = E

p (1 z yn yn zESTn (z))(1 z yn yn zESTn (z) + l )

l=1

p

1 X El

2

:= J1 + J2 .

p (1 z yn yn zESTn (z))

l=1

First lets consider J2 . Note that

1 1

|El | = 2 Etr(lT X(l)T ( X(l)X(l)T zIp1 )1 X(l)l ) + yn + yn ESTn (z)

n n

1 1

= 2 tr(EX(l) ( X(l)X(l)T zIp1 )1 X(l)El lT ) + yn + yn ESTn (z)

T

n n

1 1 1

= Etr(( X(l)X(l)T zIp1 )1 X(l)X(l)T ) + yn + yn ESTn (z)

n n n

p1 z 1

+ Etr( X(l)X(l) zIp1 )1 + yn + yn ESTn (z)

T

=

n n n

1 |yn z| 1 1

E tr( X(l)X(l)T zIp1 )1 tr( XXT zIp1 )1

+

n p n n

1 |z|

+

n n

R yx

and |1 z yn yn zESTn (z)| Im(1 z yn yn zESTn (z)) = (1 + 0 (x) 2 + 2 dFn(x) ).

1 1 1

= Im( lT l z 2 lT X(l)T ( X(l)X(l)T zIp1 )1 X(l)l )

n n n

1 T 1 1

= (1 + 2 l X(l) ( X(l)X(l)T Ip1 )2 + 2 Ip1

T

X(l)l ) < ,

n n

then |J1 | p13 pl=1 E|l |2 . So we only need to show E|l |2 0 as n . Let El denote

P

the conditional expectation given {j , j 6= l}. Then E|l |2 E|l El l |2 + E|El l El |2 +

|El |2 . Define A = (aij ) = In n1 X(l)T ( n1 X(l)X(l)T zIp1 )1 X(l). Then

n

1 X X

aii (x2li 1) +

l El l = aij xki xkj .

n

i=1 i6=j

30

In addition, we have

n

2 1 X X

|aii |2 E(x4li 1) + 2|aij |2 .

El |l El l | = 2

n

i=1 i6=j

Pn Pn

T x2 x2

Since A = (In X(l)X(l) nz )1 , we know |aii | = |1 k=1 nz

ki

| 1 + k=1

nv

ki

. Then

1 P n 2 4 C P 2 T T

E n2 i=1 aii E(|xli | 1) n . It is easy to see that i,j |aij | = tr(AA ) and tr(AA ) =

T T n|z|2

|z|2 tr{( X(l)X(l)

n zIn )( X(l)X(l)

n zIn )}1 2

. So we conclude that E|l El l |2 C

n.

By the similar approach in the proof of (3.27), we have

E|El l El |2 = E|tr( X(l)X(l)T

zI p1 ) Etr( X(l)X(l)T

zI p1 ) | .

p2 n n n2 2

Step 3. First let us review the Vitalis convergence lemma.

LEMMA 3.9 Let fn be any analytic sequence in D. Assume |fn (z)| M for all n and

z C, where C is a connected open set. If fn (z) converges for each z in a dense subset of

D. Then there exists an analytic function f (z) in D such that fn (z) f (z) for z D.

By the Steps 1 and 2, for any fixed z D, we have STn (z) S(z) almost surely. This means

that, for any z D, there exists a set Nz with probability 0, such that STn (z, ) S(z)

for all Nzc . Let D0 = {zm } be a dense subset of D (e.g. all zm have rational real and

imaginary parts). Then STn (z, ) S(z) for all D0 Nzcm and P (D0 Nzcm ) = 1. Now

apply Lemma 3.9, we have

STn (z, ) S(z)

By all these three steps in above, we conclude that F Sn (x) F M p (x) a.s.

Let x1 , , xn be i.i.d. Rp -valued random vectors from a normal distribution Np (, ),

where Rp is the mean vector and is the p p covariance matrix. Consider the

hypothesis test:

H0 : = Ip vs H1 : 6= Ip (4.29)

31

4.1 Likelihood Ratio Test for Sphericity

Denote

n n

1X X A

x = xi , A = (xi x)(xi x)T , and S = . (4.30)

n n1

i=1 i=1

1 n

L(, ) = det()n/2 exp( tr(1 A)) exp{ (x )T 1 (x )}.

2 2

Then the likelihood ratio statistic is

sup, L(, Ip )

n = .

sup, L(, )

First we derive the denominator. Since L(, ) det()n/2 exp( 12 tr(1 A)) with equal-

ity if and only if = x, then we know x maximize L(, ) for all . Now we only need to

maximize det()n/2 exp( 12 tr(1 A)). Note that

n 1

g() = log L(x, ) = det(1 ) tr(1 A)

2 2

n 1 1 1 n

= det( A) tr( A) det(A)

2 2 2

p

1 X n

= (n log i i ) det(A),

2 2

i=1

. . . = p = n. So

np np

sup L(, ) = n 2 exp( )det(A)n/2 .

, 2

The numerator is

np 1 1 np np

sup L(, Ip ) = sup 2 exp( tr(A)) = trA) 2 exp( ).

, , 2 np 2

So we define

p p

trA trS

Vn = 2/n

n = detA = detS . (4.31)

p p

Notice that the matrix A and S are singular when p > n and consequently their determinants

are equal to zero in this case. This indicates that the likelihood ratio test of (4.29) only

exists when p n.

Bai et al.(2010) made a correction to the traditional likelihood ratio test statistic Vn

to make it suitable for testing a high-dimensional normal distribution by using the CLT of

32

linear spectral statistics of S. They defined Ln = tr(S)log(detS)p =

P

i {i log(i )1},

where 1 , . . . , p are the eigenvalues of S. It is easy to see that Ln is a linear spectral

statistic. Then the central limit theorem of linear spectral statistics plays an essential role

here. Lets review a specialization of the theorem from Bai and Silversten (1996).

LEMMA 4.1 For any analytic function f and xij are i.i.d. random variables with mean

0, variance 1 and Ex4ij < . Assume np y (0, 1) as n . Then Gn (f ) =

p f (x)(F Sn (x) Fyn (x))dx converges to a normal distribution with mean vector

R

b(y)

f (a(y)) + f (b(y)) 1 f (x)

Z

m(f ) = p dx

4 2 a(y) 4y (x 1 y)2

1 f (z1 )f (z2 )

I I

v(f ) = 2 dm(z1 )dm(z2 ),

2 (m(z1 ) m(z2 ))2

where m(z) is the Stieltjies transform of Fy = (1 y)I[0,) + yFy and the contours are non

overlapping and both contain the support of Fy . The functions Fyn and Fy are the MP law

with index p/n and y.

R

analytic. Then according to the above lemma Ln converges to normal distribution weakly.

For more details, please refer Bai et al (2010). There is another way to derive the same

central limit result for Ln . Recall the density of -Laguerre ensemble has a from

p Pp

1

c,a aq

Y Y

f,a (1 , , p ) = L |i j |

i e 2 i=1 i

(4.32)

1i<jp i=1

p

(1 + 2 )

c,a

Y

pa

L =2 , (4.33)

j=1 (1 + 2 j)(a 2 (p j))

and

Z

() := exp(x)x1 dx for C with Re() > 0, (4.34)

0

of nS in our case corresponds to the -Laguerre ensemble in (4.32) with = 1, a =

1

2 (n 1) and q = 1 + 12 (p 1).

33

THEOREM 4 Let x1 , . . . , xn be i.i.d. random vectors with normal distribution Np (, ).

Define Ln = i {i /n log(i ) + log n 1}, where 1 , . . . , p are the eigenvalues of nS.

P

Assume = I. If n > p = pn and limn p/n = y (0, 1], then (Ln n )/n converges

in distribution to N (0, 1) as n , where

3 p hp p i

n = n p log 1 + p y and n2 = 2 + log 1 .

2 n n n

LEMMA 4.2 Let n > p and Ln be defined as in above. Assume 1 , , p have density

function f,a (1 , , p ) as in (4.32) with a = 2 (n 1) and q = 1 + 2 (p 1). Then

p1

tLn (log n1)pt

2t p(t 2 (n1)) pt Y (a t 2 j)

Ee =e 1 2

n

j=0 (a 2 j)

for any t 21 , 12 ( n) .

Proof. Recall

p

1X

Ln = (j n log j ) + p log n p.

n

j=1

We then have

EetLn

Z Pp p

t Y

= e (log n1)pt

e n j=1 j

t

j f,a (1 , , p ) d1 dp

[0,)p j=1

Z Pp p

1 t (at)q

= e(log n1)pt c,a e( 2 n ) j

Y Y

L

j=1 j |k l | d1 dp .

[0,)p j=1 1k<lp

n )j for 1 j p.

It follows that the above is identical to

2t p(atq) 2 p(p1)p

e(log n1)pt c,a

L 1

n

Z p

p

12 j=1 j

P

(at)q

Y Y

e j |k l | d1 dp . (4.35)

[0,)p j=1 1k<lp

t< (n p) = (n 1) (p 1) = a (p 1).

2 2 2 2 2

34

That is, a t > 2 (p 1). Therefore the integral in (4.35) is equal to 1/c,at

L by (4.32) and

(4.33). It then from (4.35) and (4.35) that

,a

2t p(atq) 2 p(p1)p cL

EetLn = e(log n1)pt 1 ,at

n cL

p

2t p(atq) 2 p(p1)p pt Y (a t 2 (p j))

= e(log n1)pt 1 2

.

n

j=1 (a 2 (p j))

p1

tLn (log n1)pt

2t p(t 2 (n1)) pt Y (a t 2 j)

Ee =e 1 2

.

n

j=0 (a 2 j)

The asymptotic expansion of a univariate Gamma function was best known as the Sterling

formula:

1 1 1

log (z) = z log z z log z + log 2 + +O . (4.36)

2 12z Re(z)3

Since our application of these results are mainly in the statistical area, we limit our discus-

sion to the real space R for the sake of simplicity in derivation. We begin with the following

lemma:

(x + b) b2 b

log = b log x + + c(x) (4.37)

(x) 2x

where () is the univariate Gamma function as defined in (4.34) and

c(x) =

O(x2 ), if b(x) = O(1) as x +.

PROPOSITION 4.1 Let n > p = pn and rn = ( log(1 np ))1/2 . Assume that p/n y

(0, 1] and t = tn = O(1/rn ) as n . Then, as n ,

n1

Y ( 2i t)

log = pt(1 + log 2) pt log n + rn2 t2 + (p n + 1.5)t + o(1).

i=np

( 2i )

Sketch Proof of Theorem 4. First, since log(1 x) < x for all x < 1, we know n2 > 0

for all n > p 1. Now, by assumption, it is easy to see

2 y + log(1 y), if y (0, 1);

2

lim = (4.38)

n n +, if y = 1.

35

Trivially, the limit is always positive. Consequently,

n L o 2

n n

E exp s es /2 = EesN (0,1) (4.39)

n

as n for all s such that |s| < 0 /2.

Fix s such that |s| < 0 /2. Set t = tn = s/n . Then |tn | < 1/2 for all n > p 1. In Lemma

4.2, take = 1 and a = (n 1)/2,

np p p1 nj1

2t pt 2 + 2 pt Y ( 2 t)

EetLn = e(log n1)pt 1 2 .

n ( nj1

2 ) j=0

Letting i = n j 1, we get

n1

tLn pt (log n1)pt

2t pt

np

2

+ p2 Y ( 2i t)

Ee =2 e 1 (4.40)

n

i=np

( 2i )

n1

Y ( i t)

tLn

1 n 2t 2

log Ee = pt(log n 1 log 2) + p t + log 1 + log i

.

2 n ( 2)

i=np

2

Now, use identity log(1 x) = x x2 + O(x3 ) as x 0 to have

1 n 2t 1 n 2t 2t2 1

p t+ log 1 = p t+ 2 + O( 3 )

2 n 2 n n n

2pt 1n t

= t+ 1+ + o(1)

n 2 n

2pt 1

1n 1

= t+ + O( ) + o(1)

n 2 2 n

p

= t2 + pt yt + o(1)

n

as n . Recall rn = ( log(1 np ))1/2 . We know t = tn = s

n = O( r1n ) as n . By

Proposition 4.1,

n1

Y ( 2i t)

log = pt(1 + log 2) pt log n + rn2 t2 + (p n + 1.5)t + o(1)

i=np

( 2i )

as n . Join all the assertions from (4.40) to the above to obtain that

p 2

log EetLn = pt(log n 1 log 2) t + pt yt

n

+ pt(1 + log 2) pt log n + rn2 t2 + (p n + 1.5)t + o(1)

p

= + rn2 t2 + [p + rn2 (p n + 1.5) y]t + o(1)(4.41)

n

36

as n . Noticing

3 p

p + rn2 (p n + 1.5) y = n p log 1 + p y = n

2 n

s2

and from the definition of n and notation t = sn , we know np + rn2 t2 =

2. Hence, it

follows from (4.41) that

n L o s2

n n

log E exp s = log EetLn n t

n 2

as n . This implies (4.39). The proof is completed.

Now let us go back to see the LRT statistics Vn without any correction. The statistic

Vn is commonly known as the ellipticity statistic. The distribution of the test statistic Vn

can be studied through its moments. When the null hypothesis H0 : = Ip is true, the

following result is referenced from page 341 of Muirhead (1982):

(n1)p p (n1)

+h 1

E(Vnh ) =p ph

(n1)

2 2

n1 for h > , (4.42)

p 2 2

2 + ph

where

p

p(p1)/4

Y 1 1

p () := (i 1) for Re() > (p 1). (4.43)

2 2

i=1

When p is assumed a fixed integer, the following result gives an explicit expansion of the

distribution function of 2 log Vn , where = 1(2p2 +p+2)/(6np6p), as M = (n1)

:

Pr (n 1) log Vn x

= Pr(2f x) + 2 Pr(2f +4 x) Pr(2f x) + O M 3

(4.44)

M

where f = (p + 2)(p 1)/2, = (n 1)2 2 2 , and 2 given by

2 = . (4.45)

288p2 (n 1)2 2

Now we use (4.42) to derive a central limit theorem for the likelihood ratio test statistic

log Vn as given in (4.31) directly.

that n > 1 + p for all n 3 and p/n y (0, 1] as n . Let Vn be defined as in

37

(4.31). Then under H0 : = Ip ( unknown), (log Vn n )/n converges in distribution

to N (0, 1) as n , where

p

n = p (n p 1.5) log 1 ,

n1

2 p p

n = 2 + log 1 > 0.

n1 n1

Sketch Proof of Theorem 5. Recall that a sufficient condition for a sequence of random

variables {Zn ; n 1} converges to Z in distribution as n is that

n

for all t (t0 , t0 ), where t0 > 0 is a constant. Thus, to prove the theorem, it suffices to

show that there exists 0 > 0 such that

n log V o 2

n n

E exp s es /2 (4.47)

n

as n for all |s| < 0 . The rest is same as in the proof of Theorem 4.

A different test for sphericity other than the likelihood ratio test is the following:

2

(1/p)trS2

1 S

U = tr Ip = 1. (4.48)

p (1/p)trS [(1/p)trS]2

Under the null hypothesis of (4.29), the limiting distribution of the test statistic U , as the

sample size n goes to infinity while the dimension p remains fixed, is given by

np d

U 2p(p+1)/21 . (4.49)

2

Ledoit and Wolf (2002) re-examined the limiting distribution of the test statistic U in the

high-dimensional situation where p/n c (0, ). They proved that under the null

hypothesis of (4.29),

d

nUn p N (1, 4). (4.50)

2 2 d

p N (1, 4), (4.51)

p p(p+1)/21

the asymptotic results of test statistic U still remains valid for practice in the high-dimensional

case (i.e. both p and n are large).

38

Now we introduce another type of test statistics, which deal the case p n. Recall

Xn = (xij ) be an n p random matrix where the entries xij are i.i.d. real random variables

with mean and variance 2 > 0. Let x1 , x2 , , xp be the p columns of Xn . The sample

correlation matrix n is defined by n := (ij ) with

(xi xi )T (xj xj )

ij = , 1 i, j p (4.52)

kxi xi k kxj xj k

Pn

where xk = (1/n) i=1 xik and k k is the usual Euclidean norm in Rn . Here we write

xi xi for xi xi e, where e = (1, 1, , 1)T Rn . The largest magnitude of the off-diagonal

entries of the sample correlation matrix is defined as

1i<jp

THEOREM 6 Suppose Eet0 |x11 | < for some 0 < 2 and t0 > 0. Set = /(4 + ).

Assume p = p(n) and log p = o(n ) as n . Then nL2n 4 log p+log log p converges

weakly to an extreme distribution of type I with distribution function

1 ey/2

F (y) = e 8 , y R.

to show that

n2 L2n Wn2

0. (4.54)

n

So we only need to show the following result for Wn .

E(x211 ) = 1 and Eet0 |x11 | < for some 0 < 2 and t0 > 0. Let Wn = max1i<jp |xTi xj |.

Set = /(4 + ). Assume p = p(n) and log p = o(n ) as n . Then

2

Wn n

z/2

P z eKe

n

as n for any z R, where n = 4n log p n log(log p) and K = ( 8)1 .

Before the proof, lets see some useful results first The following Poisson approximation

result is essentially a special case of Theorem 1 from Arratia et al. (1989).

LEMMA 4.4 Let I be an index set and {B , I} be a set of subsets of I, that is,

B I for each I. Let also { , I} be random variables. For a given t R, set

P

= I P ( > t). Then

|P (max t) e | (1 1 )(b1 + b2 + b3 )

I

39

where

X X

b1 = P ( > t)P ( > t),

I B

X X

b2 = P ( > t, > t),

I 6=B

X

b3 = E|P ( > t|( ,

/ B )) P ( > t)|,

I

and ( ,

/ B ) is the -algebra generated by { ,

/ B }. In particular, if is

independent of { ,

/ B } for each , then b3 = 0.

The following conclusion is Example 1 from Sakhanenko (1991). See also Lemma 6.2

from Liu et al (2008).

n

X n

X n

X

s2n = Ei2 , %n = 3

E|i | , Sn = i .

i=1 i=1 i=1

n %n )

The following are moderate deviation results from Chen (1990), see also Chen (1991), Dembo

and Zeitouni (1998) and Ledoux (1992). They are a special type of large deviations.

LEMMA 4.6 Suppose 1 , 2 , are i.i.d. r.v.s with E1 = 0 and E12 = 1. Put Sn =

Pn

i=1 i . Let 0 < 1 and {an ; n 1} satisfy that an + and an = o n

2(2) . If

Eet0 |1 | < for some t0 > 0, then

1 S

n

1

lim log P an = (4.55)

n a2n n 2

for any u > 0.

LEMMA 4.7 Let 1 , , n be i.i.d. random variables with E1 = 0, E12 = 1 and Eet0 |1 | <

for some t0 > 0 and 0 < 1. Put Sn = ni=1 i and = /(2 + ). Then, for any

P

{pn ; n 1} with 0 < pn and log pn = o(n ) and {yn ; n 1} with yn y > 0,

2 /2

yn

Sn p

n (log pn )1/2

P yn

n log pn 2 y

as n .

40

Proof of the Proposition 4.2. It suffices to show that

z/2

P max |yij | n + nz eKe , (4.56)

1i<jp

Pn

where yij = k=1 xki xkj . We now apply Lemma 4.4 to prove (4.56). Take I = {(i, j); 1

i < j p}. For u = (i, j) I, set Xu = |yij | and Bu = {(k, l) I; one of k and l =

i or j, but (k, l) 6= u}. Let an = n + nz and Aij = {|yij | > an }. Since {yij ; (i, j) I}

are identically distributed, by Lemma 4.4,

where

p(p 1)

n = P (A12 ), b1,n 2p3 P (A12 )2 and b2,n 2p3 P (A12 A13 ). (4.58)

2

We first calculate n . Write

p2 p

r

|y12 |

n

n = P > +z (4.59)

2 n n

Pn

and y12 = i=1 i , where {i ; 1 i n} are i.i.d. random variables with the same

distribution as that of x11 x12 . In particular, E1 = 0 and E12 = 1. Note 1 := /2 1.

We then have

1

x211 + x212

1 1

|x11 x12 | |x 11 | + |x 12 | .

2 21

Hence, by independence,

1 1

Eet0 |1 | = Eet0 |x11 x12 | < .

p

Let yn =

r Pn

y12 n i=1 i

P > +z = P > yn

n n n log p

2

pyn /2 (log p)1/2 ez/2 1

2 2 8 p2

as n . Considering Exij = 0, it is easy to see that the above also holds if y12 is replaced

by y12 . These and (4.59) imply that

p2 p ez/2 1 ez/2

n 2 2 (4.60)

2 8 p 8

as n .

41

Recall (4.57) and (4.58), to complete the proof, we have to verify that b1,n 0 and

b2,n 0 as n . By (4.58), (4.59) and (4.60),

8p3 2n

1

= 2 2

=O

(p p) p

as n . Also, by (4.58),

P xk1 xk2 p

P

k=1 xk1 xk3 p

3 k=1

b2,n 2p P > n /n + z, > n /n + z

n n

n P xk1 (xk2 + xk3 ) p

= 2p3 P k=1

> 2 n /n + z

n

P xk1 (xk2 xk3 ) p o

+P k=1 > 2 n /n + z .

n

Since Exk1 (xk2 + xk3 ) = 0 and E{xk1 (xk2 + xk3 )}2 = 2, applying (4.55), we get

P xk1 (xk2 + xk3 )

> 2 n /n + z 2 exp((1 )(n /n + z)) 2p(4) .

p

P k=1

n

P

x (x x ) p

The similar result holds for P k=1 k1n k2 k3 > 2 n /n + z . Therefore b2,n = o(1).

Acknowledgements

This set of lecture notes is mainly based on the book written by Bai and Silverstein (2009).

It also includes materials from professor Tiefeng Jiangs lecture notes and the book by

Anderson, Guionnet and Zeitouni (2010).

References

[1] Anderson, G. W., Guionnet, A. and Zeitouni, O. (2010). An introduction to

random matrices. Cambridge University Press.

[2] Bai, Z. and Silverstein, J. W. (2009). Spectral analysis of large dimensional ran-

dom matrices. Springer.

matrices, a review. Statist. Sinica, 9(3), 611-677.

42

[4] Bai, Z., Jiang, D., Yao, J. and Zheng, S. (2009) Corrections to LRT on large-

dimensional covariance matrix by RMT. Ann. Statist. 37(6B),3822-3840.

[5] Cai, T. and Jiang, T. (2011). Limiting Laws of Coherence of Random Matrices

with Applications to Testing Covariance Structure and Construction of Com-

pressed Sensing Matrices. Ann. Stat. 39(3), 1496-1525.

[6] Jiang, D., Jiang, T. and Yang, F. (2012). Likelihood Ratio Tests for Covari-

ance Matrices of High-Dimensional Normal Distributions. Journal of Statistical

Planning and Inference142(8), 2241-2256.

[7] Jiang, T. and Yang, F. (2013). Central Limit Theorems for Classical Likelihood

Ratio Tests for High-Dimensional Normal Distributions. Ann. Stat. 41(4), 2029-

2074.

43

- Detection of Malware by Using Support Vector MachineUploaded byEditor IJRITCC
- Sjr05010202 Mathematical MethodsUploaded byandhracolleges
- COBOLUploaded bysuman_yarlagadda
- Mathe QuestionUploaded bycocojumbo12345
- Tutorial tensorlabUploaded bykeesenpiet
- Boussias 1.docxUploaded byAstrid Aubry
- GPTutorialUploaded byKayalvizhi Lakshman
- ANNEXURE 2Uploaded bySanchari_rittika
- Java-LabUploaded byPadma Pradhan
- 022401.pdfUploaded byyagoegay
- Modal Test and AnalysisUploaded byDaniel_Ali_b
- IntroductionUploaded byMarian Celeste
- fmcs_noanswersUploaded byManuel Vargas Tapia
- AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)Uploaded byCS & IT
- Face RecognitionUploaded bywaqarahmed4544
- Quantum Defect Theory I.pdfUploaded byana1novi1
- 462 NotesUploaded byJhonny Corregidor
- notes applied machineUploaded bydkm2030
- Maths Sheet FinalUploaded byDethsor
- Comparacion de Metosodos LyapunovUploaded byRafaelLibrerosSanchez
- 04267885.pdfUploaded byJoyce George
- cover-150809074001-lva1-app6892Uploaded byakash
- Workshop MATLABUploaded byRakesh Kumar
- 2015-16 Mcet Cse Stu Handout Version 1Uploaded byrazaravi
- 1847-10901-1-PBUploaded byyudhasaintika
- Dispensa Da Fare MatlabUploaded byFrank Whittle
- 2005sUploaded byKagz
- (AMS Graduate studies in mathematics 132) Terence Tao - Topics in Random Matrix Theory-American Mathematical Society (2012).pdfUploaded byFranklinn Franklin Ffranklin
- IAS Mains Mathematics 1993Uploaded byKuldeep Kumar
- s19SyllabusUploaded byAbiha Ejaz

- Activation InfoUploaded byDamian
- TDA8950(2).pdfUploaded byElcura Edgar
- Board Level Shield 2 PieceUploaded byDamian
- Board Level Shield 1 PieceUploaded byDamian
- BoardLevelShield LowProfile FilledUploaded byDamian
- BoardLevelShield_LowProfileUploaded byDamian
- COILCRAFT_PFL2015-222MEB-SMD_Inductor-2_2µH_1A_0_21ohmUploaded byDamian
- TI CCdebugger05 User MannualUploaded byDamian
- tidub31Uploaded byDamian
- tidrjc9Uploaded byDamian
- bq51050bUploaded byDamian
- bst-bma250-ds002-05.pdfUploaded byGkou Dojku
- Tidr191 SchematicUploaded byDamian
- adafruit-neopixel-uberguideUploaded byDamian
- WS2812Uploaded byMichele Vece
- Functions of MatricesUploaded byRenzhe Zheng
- book1Uploaded byDamian
- MasterSlaveSwitch.pdfUploaded byDamian
- EW8051_DebuggingGuideUploaded byDamian
- IAR_refUploaded byDamian
- EW8051_CompilerReference.pdfUploaded byDamian
- EW8051_AssemblerReferenceUploaded byDamian
- temt6000-201285Uploaded byDamian
- Tsl 2561Uploaded byP B
- PCA9956AUploaded byDamian
- PCA9955AUploaded byDamian
- RM-MPU-9250A-00-v1.6Uploaded byDamian
- InvenSense-Motion-Sensor-Universal-EV-User-Guide3.pdfUploaded byDamian

- Chapter 05Uploaded bybella
- LsqUploaded byparticleperson
- State SpaceUploaded byJaol1976
- Supplement 3011Uploaded byStavros Pitelis
- 3CG-LUploaded byKallu Badmash
- Master h25 10 EnUploaded byJorge Pachas
- Algebra II Chapter 3 TestUploaded bytentinger
- Week1Uploaded byBhargav Gamit
- E-Notes 7 - Vector Cross ProductUploaded byKesava Dass
- Aes Cipher.part2Uploaded bygrmsundar
- mbe5matrixUploaded byGiovanni1618
- Copy of HW1Uploaded byMackenzie Shelley
- Statics- Force Vectors2Uploaded byDhanis Paramaguru
- Linear Algebra Spring 2015Uploaded bysofi
- STQP2034 Tutorial 2Uploaded bySabariah Othman
- MIT18_06S10_pset1_s10_solnUploaded byAkshay Goyal
- cis515-15-spectral-clust-chap5Uploaded bymaxxagain
- MT Chapter 03Uploaded byCornelz Lius
- sobolevnotes.pdfUploaded byTimothy P
- 1.Mathematical PhysicsUploaded byAnonymous lR6BbH9qRG
- 10.1016@j.ijepes.2013.11.009Uploaded byJesús BeBr
- Lecture 1Uploaded bySimran Radheshyam Soni
- Matrix Perturbation TheoryUploaded bysrikanth.gvs
- Improved PCA Algorithm for Face RecognitionUploaded byTI Journals Publishing
- Ee 701 Lecture NotesUploaded byarafatasghar
- Symmetries and particlesUploaded byGabriel Lau
- Matrix Multiplication on Linear Bidirectional Systolic arrays.pdfUploaded bymasoud
- pdf-mat-2Uploaded byblue_l1
- UF Sparse Matrix Collection - HB GroupUploaded bythecodefactory
- ISC-2019-Mathematics-Important-types-of-questions.pdfUploaded byVivek Munji

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.