You are on page 1of 26

Transelliptical Component Analysis

Fang Han Han Liu

Johns Hopkins University Princeton University

Neural Information Processing Systems (NIPS)


Lake Tahoe, Nevada, 2012

1
General Framework

• Using semiparametric model

• Obtaining nonparametric modeling flexibility

• Achieving nearly optimal parametric rates

• Simple and computational efficient

2
Outline

• Sparse Principal Component Analysis

• Transelliptical Component Analysis

• Concluding Remarks

3
Sparse Principal Component Analysis

4
Leading Eigenvectors Estimation

Let X1 , . . . , Xn be n i.i.d d-variate random vectors with covariance matrix Σ.

Goal: Estimate the top m leading eigenvectors θ1 , . . . , θm of Σ based on the


samples {Xi }.

5
Applications

• Large-scale genomic data

• Brain imaging data

• Equity data

• ...

6
Sparse Leading Eigenvectors

Let {Xi : i = 1, . . . , n} be n random samples from a d−variate Gaussian or


sub-Gaussian distribution with covariance matrix Σ.

Goal: Estimate θ1 based on {Xi }, while θ1 is sparse.

7
Assumption and Estimation

Assumption 1: Assume that ||θ1 ||0 = s < n.

Estimation:

θb1∗ = arg max v T Σn v, subject to v ∈ Sd−1 ∩ B0 (s),


v∈Rd

where Σn is the sample covariance matrix, Sd−1 is the unit sphere in Rd and
B0 (s) := {v ∈ Rd : ||v||0 ≤ s}.

8
Rates of Convergence

Assumption 2: Assume that X is Gaussian or sub-Gaussian distributed.

Theorem. Let ||θ1 ||0 = s. Then the parametric rate of convergence under the
`2 norm is
r !
1 s log d
OP ,
λ 1 − λ2 n

where λ1 and λ2 are the top two largest eigenvalues of Σ.

9
Constraints
The Gaussian or sub-Gaussian assumption is too constraint, appearing to be
rare in applications.

10
Equity Data

452 stocks that were consistently in the S&P 500 index between January 1,
2003 though January 1, 2008.

0.06
0.40

0.04
0.02
0.38

0.00
stock 2

stock 4
0.36

−0.02
−0.04
0.34

−0.06
0.32

0.36 0.37 0.38 0.39 0.40 0.41 0.06 0.08 0.10 0.12

stock 1 stock 3

11
Extensions to non-Gaussian (Abnormal)

12
Liu, Lafferty and Wasserman (2009, JMLR)

Definition. A random vector X = (X1 , . . . , Xd )T is said to follow a


nonparanormal distribution if there exists a set of unspecified univariate
increasing functions {fj }dj=1 such that

(f1 (X1 ), f2 (X2 ), . . . , fd (Xd ))T ∼ Nd (0, Σ), where diag(Σ) = 1.

13
z
z

z
x

x
y
x
y y

3 3 3

2 2 2

1 1 1

0 0 0

−1 −1 −1

−2 −2 −2

−3 −3 −3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

14
Liu et.al. Annals of Statistics (2012)

For each pair (Xj , Xk ), Spearman’s rho coefficient of (Xj , Xk ), denoted by


ρbjk , is the correlation of the ranks. Let
h π i
b ρ := 2 sin
R ρbjk .
6
q 
b ρ − Σ||max = OP log d
Theorem. Confined in the nonparanormal, ||R n .

Parametric (optimal) rates in graph recovery and near-parametric rate


in principal component analysis (Han and Liu, NIPS 2012).

15
Good Enough?
Theorem. The tail dependence is zero in Nonparanormal (Gaussian Copula).

16
17
Elliptical Distribution
When the density exists, the elliptical distribution has the density:

f (x) = k · g((x − µ)T Σ−1 (x − µ)),

where g is an unspecified univariate positive function. In this case, we


represent it by ECd (µ, Σ, g).

Note: Elliptical distributions can be very heavy tailed and even have
infinite first or second moments.

18
Transelliptical Distribution

Definition. A random vector X = (X1 , . . . , Xd )T is said to follow a


transelliptical distribution if there exists a set of unspecified univariate
increasing functions {fj }dj=1 such that

(f1 (X1 ), f2 (X2 ), . . . , fd (Xd ))T ∼ ECd (0, Σ, g), where diag(Σ) = 1.

In this case, we represent X ∼ T Ed (Σ, g; f1 , . . . , fd ).

19
Kendall’s tau and Its Invariance Property

Population Kendall’s tau:

τ (Xj , Xk ) = P((Xj − X
ej )(Xk − X
ek ) > 0) − P((Xj − X
ej )(Xk − X
ek ) < 0),

where (X
ej , X
ek ) is a independent copy of (Xj , Xk ).

Theorem. Given X ∼ T Ed (Σ, g; f1 , . . . , fd ),


π 
Σjk = sin τ (Xj , Xk )
2
.

Invariant to both g and fj .

20
Transelliptical Component Analysis
Let x1 , . . . , xn be n independent realizations of X. Using the Kendall’s tau
correlation coefficient estimate:
2 X
τbjk = sign (xij − xi0 j ) (xik − xi0 k ) .
n(n − 1)
1≤i<i0 ≤n

Let
h π i

R := sin τbjk .
2

b τ into any sparse principal component algorithm.


Plugging R

21
Theoretical Results
In estimating θ1 and its support, we have

• the near-parametric rate of convergence under the `2 norm:


r !
s log d
OP
λ1 − λ 2 n

• a support recovery threshold at an order of


r !
s log d
OP
λ1 − λ 2 n

22
Equity Data Again
Prediction of the Market Trend:

86
85
Successful Matches %

84
83

Pearson
NS
Kendall
82

15 20 25 30 35 40 45

23
Beyond PCA
• Directed Graphs Estimation

• Un-directed Graphs Estimation

• Discriminant Analysis

• Independent Component Analysis

• Canonical Component Analysis

• ...

24
Remarks
• Model Flexibility

• Parametric or near-parametric rate

• Procedure simple and computation efficiency

25
Thanks!
for more information, please come to the poster session (Tu66).

26

You might also like