Jonathan Monsalve .: Thursday 20 February, 2020

Dissertation Thesis Proposal
Sample Covariance Matrix Estimation from

Multiple Random Projections with Highly
Compressed Data
Jonathan Monsalve.
Thursday 20th February, 2020
Universidad Industrial de Santander
Advisor Ph.D. Henry Arguello
Agenda
1
Spectral imaging
F ∈ RN×N×L with N × N
spatial dimensions and L
spectral bands.
# High sizes and costs.
# Time consuming
acquisition.
2
Compressive Sensing (CS)
x y
Result
ADC Processing 1
Traditional acquisition and processing set-up
x y
Result
Compressive Processing
ADC
1
Compressive acquisition and processing set-up

3
Compressive covariance sampling (CCS)
These techniques aim at recovering the second-moment statistics of

high-dimensional signals from compressive projections.
Note: The k-th moment of a population is given by

µ k E(X k ) (1)
For a random vector, the sample covariance matrix is given by
n
1Õ T
Σ xx (2)
n
i1
The purpose of CCS is to recover the second-moment statistics
from a set of random projection given by
y PT x (3)
4
where P ∈ Rl×m
Principal component analysis
The covariance matrix of a signal is used in a variety of

technique such as PCA
PCA Projection
The compression is performed by using the eigenvectors W of

the covariance matrix
Σ WΛWT (4)
5
Random projections
In a CS set-up the signal X ∈ Rl×n is unknown and a random

projection is captured
Y PT X + N, (5)
by assuming that the signal and noise are uncorrelated,

compressive covariance matrix is given by
(PT XXT P + NNT )

Σ̃ PT ΣP + σr2 I (6)
n
6
Covariance matrix estimation
# Least squares recovery

# Regularized problem
arg min ||Σ̃ − PΣP||F2 + λψ(Σ)

arg min ||Σ̃ − PΣP||F2 Σ
Σ
s.t. Σ 0
(8) 7
(7)
Problem Statement
Hypothesis: The first and second sample statistical moments of

a dataset can be accurately estimated directly from
low-dimensional random projections and they can be used to
improve the reconstruction of the high dimensional data using
compressive spectral imaging.
General objective: To design and optimize an algorithm to
retrieve sample statistics from compressive measurements and
to analyze the use of the sample statistics to reconstruct the
underlying signal using compressive sensing theory.
8
Specific objectives
1. To determine the most suitable sensing/projection protocols
based on compressive sensing and random projections from the
state-of-the-art applicable to hyperspectral imaging to be used
in the statistics recovery.
2. To design an algorithm based on the gradient descent
method to recover the first and second sample statistical
moments from low-dimensional random projections.
3. To test the performance of the proposed algorithm to recover
the sample statistics in hyperspectral imaging.
4. To adapt a state-of-the-art algorithm to estimate the
vegetation cover using sample statistics and random
low-dimensional projections of hyperspectral images based on
the proposed approach.
5. To verify the performance of the adapted algorithm
9
Proposal
Lets split the dataset X into p subsets as

Xi {x j | j ∈ S i ⊂ Q, S i ∩ S k ∅, ∀i , k}. The covariance matrix
of each matrix Ci Xi XTi /|S i |. Lets project each matrix Xi as
Yi PTi Xi + Ni , (9)
Using this splitting procedure it can be reformulated as
C∗i arg min ||Σ̃i − PTi Ci Pi ||F2 + τψ(Ci )
Ci (10)
subject to Ci 0,
However, note that if |S i | is big enough we can assume that
C1 ≈ C2 ≈ · · · ≈ Cp ≈ Σ, (11)
10
Proposed optimization problem
Thus, instead of recovering each covariance matrix Ci it is

desirable to recover the sample covariance
p
Õ
Σ∗ arg min || Σ̃i − PTi ΣPi ||F2 + τψ(Σ)
Σ i1 (12)
subject to Σ 0.
11
Projected gradient descent method
The Projected gradient descent is used for problems described

by
Σ∗ arg min f (Σ)
Σ (13)
subject to Σ ∈ D,
Thus, proposed algorithm is described by
where PD is the projection onto the set of positive

semi-definitive matrices 12
Projected gradient descent method
The gradient of the function is given by

p
Õ
∇ f (Σ) Pi (Σ̃i − PTi ΣPi )PTi + τ∇ψ(Σ), (14)
i1
when ψ(Σ) Tr(Σ), its gradient is ∇Tr(Σ) I.

the parameter λ k is picked as λ k λ k−1 /η r , where constants
η > 1, λ−1 > 0 and r is the smallest positive integer (including
0) that satisfies
min f (Σk r ) ≤ f (Σk ) + trace(∇ f (Σk )T (Σk r ))

r
(15)
+||Σk r − Σk ||F2
where Σk r PD (Σk − (λ k−1 /η r )∇ f (Σk )).

13
Convergence analysis
p
Õ
∗
Σ arg min || Σ̃i − PTi ΣPi ||F2 + τψ(Σ)
Σ i1 (16)
subject to Σ 0.
Consider the vector form of the problem
p
Õ
g(Σ) ||Qi vec(Σ) − vec(Σ̃i )||22 + τdT vec(Σ), (17)
i1
with Qi PTi ⊗ PTi , ⊗ is the Kronecker product. Thus, note that

problem (17) is equivalent to
Σ∗ arg min g(Σ) + h(Σ) (18)
Σ
where h(Σ) is a indicator function of the positive semi-definitive

set
14
Convergence theorem
Theorem
Suppose that g(Σ) and h(Σ) are a proper closed and convex functions, ad-
ditionally, dom(h(Σ)) ⊆ int(dom(g(Σ))) and that g(Σ) is L-smooth. Let
{Σk }k≥0 be the sequence of points generated by the projected gradient algo-
rithm. Then for any optimal point Σ∗ and k ≥ 0
||Σk+1 − Σ∗ || ≤ ||Σk − Σ∗ || (19)
Trivially, it can be seen that (18) is convex and differentiable,

thus first part of the assumptions of theorem 1 are accomplish
15
L-smooth function
A function g is said to be L-smooth if it is differentiable and it
holds that exist a L > 0 such that
||∇g(x) − ∇g(y)||2 ≤ L||x − y||2 , (20)
for all x, y ∈ E, with E the domain of function g. Thus, for
function g(Σ) defined in 17,
p
Õ
∇g(x) QTi (Qi x − σ̃) + τd, (21)
i1
then, by replacing (21) in (20)
p
Õ p
Õ
||( QTi (Qi x − σ̃) + τd) − ( QTi (Qi x − σ̃) + τd)||2 , (22)
i1 i1
after some algebra
p
Õ p
Õ
|| (QTi Qi )(x − y)||2 ≤ || (QTi Qi )|| ||x − y||2 . (23)
i1 i1
Íp
Thus, L || i1
(QTi Qi )||. 16
Bias of the estimator
We assume that
C1 ≈ C2 ≈ · · · ≈ Cp ≈ Σ, (24)
this adds a bias to the estimator, for that let define
Ci Σ + Ri , (25)
where R ∈ Rl×l is an error matrix. Thus, the bias is explained
by lemma 2
Lemma
The gradient step for the proposed Algorithm 1 is biased by Bias[∇ f˜(Σ)]
Íp
− i1 Pi PTi Ri Pi PTi .
Using (25) the gradient is re-written as

p
Õ
∇ f˜(Σ) Pi (Σ̃i − PTi Ci Pi )PTi + τ∇ψ(Σ), (26)
i1
17
Bias of the proposed estimator
then by plugging (25) into (26) gives

p
Õ
∇ f˜(Σ) Pi (Σ̃i − PTi (Σ + Ri )Pi )PTi + τ∇ψ(Σ), (27)
i1
finally, after some algebra

p
Õ p
Õ
∇ f˜(Σ) Pi (Σ̃i − PTi ΣPi )PTi − Pi PTi Ri Pi PTi
i1 i1 (28)
+τ∇ψ(Σ),
18
Mitigating the Bias
Since we have that

p
1Õ
Ri 0, (29)
p
i1
we used a filtered gradient to mitigate the Bias
∇ fˆ(Σ) D ∗ ∇ f˜(Σ), (30)
where ∗ represents the convolution operation, and D ∈ Rk×k is

the filter.
19
Hyperspectral images used
Figure: Urban dataset. (Left) 2D spatial distribution of the 100th spectral band. (Right)
Three spectral signatures at different pixels of the image.
Figure: Pavia centre dataset. (Left) 2D spatial distribution of the 80th spectral band.
(Right) Three spectral signatures at different pixels of the image. 20
Sensing matrices
For the sensing, three different types of random matrices were

used.
# Normal distribution
P where Pi, j ∼ N(0, 1)
# Uniform distribution
P where Pi, j ∼ U(0, 1)
# Bernoulli distribution
P where Pi, j ∼ B(1, 1/3)
21
Determining the number of subsets
Figure: Mean square error of the reconstructed covariance matrix for Pavia (left) and
Urban (right) images by varying the number of subsets.
Base don this graphic we set the number of partitions p 32.
22
Accuracy of the recovered eigenvectors of the covariance matrix
Urban Pavia
Binary
Gaussian
Uniform
Figure: Mean MSE error of recovered covariance matrix when the compression ratio
vary for Urban image with 32 subsets.
23
Angle of reconstructed eigenvectors
Average error angle between eigenvectors and their reconstruction
1 ev
2 ev
3 ev Binary Gaussian Uniform
Figure: Mean angle error of recovered eigenvectors when the compression ratio vary
for Pavia image with 32 subsets. 24
Angle of reconstructed eigenvectors
Average error angle between eigenvectors and their reconstruction
1 ev
2 ev
3 ev Binary Gaussian Uniform
Figure: Mean angle error of recovered eigenvectors when the compression ratio vary
for Urban image with 32 subsets.
25
Bias and filtered gradient analysis
Figure: Comparison of the fourth eigenvector of the estimated covariance matrix and
26
the bias term with and without the filtering.
Conclusions
# We proposed an algorithm to recover the covariance matrix
from a set of compressive measurements using the
projected gradient strategy.
# Convergence is proven to this algorithm.
# The theoretical results show that a filtered gradient can
reduce the Bias.
# Experimental results show that the proposed method
outperforms the state-of-art methods.
27
Thanks
Questions?
28
Sensing matrices
How matrices are built?
k k
Codification
snapshot 1 λ
Corresponding ...
sensing matrix
P1 k P2 k P8
Codification
snapshot 2 λ
X 1
Partitions
Are covariances matrices similar each other?
2
Eigenvectors comparison
Are covariances matrices similar each other?
3
Compressive projection principal component analysis (CPPCA)
CPPCA estimates both the PCA coefficients and the most
relevant eigenvectors of the covariance matrix from an
orthogonal random projection of the data.
P2x
P1x
CPPCA recovers the eigenvectors by

J
(i) 1 Õ (j) (j)T (i−1)
w Q Q w (31)
J
j1 4
Compressive projection principal component analysis
Drawbacks of CPPCA
# Matrix P have to be orthornormal.

# CPPCA assume an eccentric behavior on the eigenvalues.
# it requires Pi PTi w to be known (which also depends on
eccentricity).

Jonathan Monsalve .: Thursday 20 February, 2020

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jonathan Monsalve .: Thursday 20 February, 2020

Uploaded by

Copyright:

Available Formats

Dissertation Thesis Proposal

Sample Covariance Matrix Estimation from

Traditional acquisition and processing set-up

Compressive acquisition and processing set-up

These techniques aim at recovering the second-moment statistics of

Note: The k-th moment of a population is given by

The covariance matrix of a signal is used in a variety of

The compression is performed by using the eigenvectors W of

In a CS set-up the signal X ∈ Rl×n is unknown and a random

by assuming that the signal and noise are uncorrelated,

(PT XXT P + NNT )

# Least squares recovery

arg min ||Σ̃ − PΣP||F2 + λψ(Σ)

Hypothesis: The first and second sample statistical moments of

Lets split the dataset X into p subsets as

However, note that if |S i | is big enough we can assume that

Thus, instead of recovering each covariance matrix Ci it is

The Projected gradient descent is used for problems described

where PD is the projection onto the set of positive

The gradient of the function is given by

when ψ(Σ) Tr(Σ), its gradient is ∇Tr(Σ) I.

min f (Σk r ) ≤ f (Σk ) + trace(∇ f (Σk )T (Σk r ))

where Σk r PD (Σk − (λ k−1 /η r )∇ f (Σk )).

with Qi PTi ⊗ PTi , ⊗ is the Kronecker product. Thus, note that

where h(Σ) is a indicator function of the positive semi-definitive

||Σk+1 − Σ∗ || ≤ ||Σk − Σ∗ || (19)

Trivially, it can be seen that (18) is convex and differentiable,

Using (25) the gradient is re-written as

then by plugging (25) into (26) gives

finally, after some algebra

Since we have that

we used a filtered gradient to mitigate the Bias

∇ fˆ(Σ) D ∗ ∇ f˜(Σ), (30)

where ∗ represents the convolution operation, and D ∈ Rk×k is

For the sensing, three different types of random matrices were

P where Pi, j ∼ N(0, 1)

P where Pi, j ∼ U(0, 1)

P where Pi, j ∼ B(1, 1/3)

Base don this graphic we set the number of partitions p 32.

Are covariances matrices similar each other?

CPPCA recovers the eigenvectors by

# Matrix P have to be orthornormal.

You might also like