Transelliptical Component Analysis: Johns Hopkins University Princeton University

Transelliptical Component Analysis
Fang Han Han Liu
Johns Hopkins University Princeton University
Neural Information Processing Systems (NIPS)

Lake Tahoe, Nevada, 2012
1
General Framework
• Using semiparametric model
• Obtaining nonparametric modeling flexibility
• Achieving nearly optimal parametric rates
• Simple and computational efficient
2
Outline
• Sparse Principal Component Analysis
• Transelliptical Component Analysis
• Concluding Remarks
3
Sparse Principal Component Analysis
4
Leading Eigenvectors Estimation
Let X1 , . . . , Xn be n i.i.d d-variate random vectors with covariance matrix Σ.
Goal: Estimate the top m leading eigenvectors θ1 , . . . , θm of Σ based on the

samples {Xi }.
5
Applications
• Large-scale genomic data
• Brain imaging data
• Equity data
• ...
6
Sparse Leading Eigenvectors
Let {Xi : i = 1, . . . , n} be n random samples from a d−variate Gaussian or

sub-Gaussian distribution with covariance matrix Σ.
Goal: Estimate θ1 based on {Xi }, while θ1 is sparse.
7
Assumption and Estimation
Assumption 1: Assume that ||θ1 ||0 = s < n.
Estimation:
θb1∗ = arg max v T Σn v, subject to v ∈ Sd−1 ∩ B0 (s),

v∈Rd
where Σn is the sample covariance matrix, Sd−1 is the unit sphere in Rd and
B0 (s) := {v ∈ Rd : ||v||0 ≤ s}.
8
Rates of Convergence
Assumption 2: Assume that X is Gaussian or sub-Gaussian distributed.
Theorem. Let ||θ1 ||0 = s. Then the parametric rate of convergence under the
`2 norm is
r !
1 s log d
OP ,
λ 1 − λ2 n
where λ1 and λ2 are the top two largest eigenvalues of Σ.
9
Constraints
The Gaussian or sub-Gaussian assumption is too constraint, appearing to be
rare in applications.
10
Equity Data
452 stocks that were consistently in the S&P 500 index between January 1,
2003 though January 1, 2008.
0.06
0.40
0.04
0.02
0.38
0.00
stock 2
stock 4
0.36
−0.02
−0.04
0.34
−0.06
0.32
0.36 0.37 0.38 0.39 0.40 0.41 0.06 0.08 0.10 0.12
stock 1 stock 3
11
Extensions to non-Gaussian (Abnormal)
12
Liu, Lafferty and Wasserman (2009, JMLR)
Definition. A random vector X = (X1 , . . . , Xd )T is said to follow a

nonparanormal distribution if there exists a set of unspecified univariate
increasing functions {fj }dj=1 such that
(f1 (X1 ), f2 (X2 ), . . . , fd (Xd ))T ∼ Nd (0, Σ), where diag(Σ) = 1.
13
z
z
z
x
x
y
x
y y
3 3 3
2 2 2
1 1 1
0 0 0
−1 −1 −1
−2 −2 −2
−3 −3 −3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
14
Liu et.al. Annals of Statistics (2012)
For each pair (Xj , Xk ), Spearman’s rho coefficient of (Xj , Xk ), denoted by

ρbjk , is the correlation of the ranks. Let
h π i
b ρ := 2 sin
R ρbjk .
6
q
b ρ − Σ||max = OP log d
Theorem. Confined in the nonparanormal, ||R n .
Parametric (optimal) rates in graph recovery and near-parametric rate

in principal component analysis (Han and Liu, NIPS 2012).
15
Good Enough?
Theorem. The tail dependence is zero in Nonparanormal (Gaussian Copula).
16
17
Elliptical Distribution
When the density exists, the elliptical distribution has the density:
f (x) = k · g((x − µ)T Σ−1 (x − µ)),
where g is an unspecified univariate positive function. In this case, we

represent it by ECd (µ, Σ, g).
Note: Elliptical distributions can be very heavy tailed and even have
infinite first or second moments.
18
Transelliptical Distribution
Definition. A random vector X = (X1 , . . . , Xd )T is said to follow a

transelliptical distribution if there exists a set of unspecified univariate
increasing functions {fj }dj=1 such that
(f1 (X1 ), f2 (X2 ), . . . , fd (Xd ))T ∼ ECd (0, Σ, g), where diag(Σ) = 1.
In this case, we represent X ∼ T Ed (Σ, g; f1 , . . . , fd ).
19
Kendall’s tau and Its Invariance Property
Population Kendall’s tau:
τ (Xj , Xk ) = P((Xj − X
ej )(Xk − X
ek ) > 0) − P((Xj − X
ej )(Xk − X
ek ) < 0),
where (X
ej , X
ek ) is a independent copy of (Xj , Xk ).
Theorem. Given X ∼ T Ed (Σ, g; f1 , . . . , fd ),

π
Σjk = sin τ (Xj , Xk )
2
.
Invariant to both g and fj .
20
Transelliptical Component Analysis
Let x1 , . . . , xn be n independent realizations of X. Using the Kendall’s tau
correlation coefficient estimate:
2 X
τbjk = sign (xij − xi0 j ) (xik − xi0 k ) .
n(n − 1)
1≤i<i0 ≤n
Let
h π i
bτ
R := sin τbjk .
2
b τ into any sparse principal component algorithm.

Plugging R
21
Theoretical Results
In estimating θ1 and its support, we have
• the near-parametric rate of convergence under the `2 norm:

r !
s log d
OP
λ1 − λ 2 n
• a support recovery threshold at an order of

r !
s log d
OP
λ1 − λ 2 n
22
Equity Data Again
Prediction of the Market Trend:
86
85
Successful Matches %
84
83
Pearson
NS
Kendall
82
15 20 25 30 35 40 45
23
Beyond PCA
• Directed Graphs Estimation
• Un-directed Graphs Estimation
• Discriminant Analysis
• Independent Component Analysis
• Canonical Component Analysis
• ...
24
Remarks
• Model Flexibility
• Parametric or near-parametric rate
• Procedure simple and computation efficiency
25
Thanks!
for more information, please come to the poster session (Tu66).
26

Transelliptical Component Analysis: Johns Hopkins University Princeton University

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Transelliptical Component Analysis: Johns Hopkins University Princeton University

Uploaded by

Copyright:

Available Formats

Transelliptical Component Analysis

Fang Han Han Liu

Johns Hopkins University Princeton University

Neural Information Processing Systems (NIPS)

• Using semiparametric model

• Obtaining nonparametric modeling flexibility

• Achieving nearly optimal parametric rates

• Simple and computational efficient

• Sparse Principal Component Analysis

• Transelliptical Component Analysis

Let X1 , . . . , Xn be n i.i.d d-variate random vectors with covariance matrix Σ.

Goal: Estimate the top m leading eigenvectors θ1 , . . . , θm of Σ based on the

• Large-scale genomic data

• Brain imaging data

Let {Xi : i = 1, . . . , n} be n random samples from a d−variate Gaussian or

Goal: Estimate θ1 based on {Xi }, while θ1 is sparse.

Assumption 1: Assume that ||θ1 ||0 = s < n.

θb1∗ = arg max v T Σn v, subject to v ∈ Sd−1 ∩ B0 (s),

Assumption 2: Assume that X is Gaussian or sub-Gaussian distributed.

where λ1 and λ2 are the top two largest eigenvalues of Σ.

Definition. A random vector X = (X1 , . . . , Xd )T is said to follow a

(f1 (X1 ), f2 (X2 ), . . . , fd (Xd ))T ∼ Nd (0, Σ), where diag(Σ) = 1.

For each pair (Xj , Xk ), Spearman’s rho coefficient of (Xj , Xk ), denoted by

Parametric (optimal) rates in graph recovery and near-parametric rate

f (x) = k · g((x − µ)T Σ−1 (x − µ)),

where g is an unspecified univariate positive function. In this case, we

Definition. A random vector X = (X1 , . . . , Xd )T is said to follow a

In this case, we represent X ∼ T Ed (Σ, g; f1 , . . . , fd ).

Population Kendall’s tau:

Theorem. Given X ∼ T Ed (Σ, g; f1 , . . . , fd ),

Invariant to both g and fj .

b τ into any sparse principal component algorithm.

• the near-parametric rate of convergence under the `2 norm:

• a support recovery threshold at an order of

• Un-directed Graphs Estimation

• Independent Component Analysis

• Canonical Component Analysis

• Parametric or near-parametric rate

• Procedure simple and computation efficiency

You might also like