You are on page 1of 34

Dissertation Thesis Proposal

Sample Covariance Matrix Estimation from


Multiple Random Projections with Highly
Compressed Data

Jonathan Monsalve.
Thursday 20th February, 2020
Universidad Industrial de Santander
Advisor Ph.D. Henry Arguello
Agenda

1
Spectral imaging

F ∈ RN×N×L with N × N
spatial dimensions and L
spectral bands.
# High sizes and costs.
# Time consuming
acquisition.

2
Compressive Sensing (CS)

x y

Result

ADC Processing 1

Traditional acquisition and processing set-up

x y

Result

Compressive Processing
ADC
1

Compressive acquisition and processing set-up


3
Compressive covariance sampling (CCS)

These techniques aim at recovering the second-moment statistics of


high-dimensional signals from compressive projections.

Note: The k-th moment of a population is given by


µ k  E(X k ) (1)
For a random vector, the sample covariance matrix is given by
n
1Õ T
Σ xx (2)
n
i1
The purpose of CCS is to recover the second-moment statistics
from a set of random projection given by
y  PT x (3)
4
where P ∈ Rl×m
Principal component analysis

The covariance matrix of a signal is used in a variety of


technique such as PCA
PCA Projection

The compression is performed by using the eigenvectors W of


the covariance matrix
Σ  WΛWT (4)
5
Random projections

In a CS set-up the signal X ∈ Rl×n is unknown and a random


projection is captured

Y  PT X + N, (5)

by assuming that the signal and noise are uncorrelated,


compressive covariance matrix is given by

(PT XXT P + NNT )


Σ̃   PT ΣP + σr2 I (6)
n

6
Covariance matrix estimation

# Least squares recovery


# Regularized problem

arg min ||Σ̃ − PΣP||F2 + λψ(Σ)


arg min ||Σ̃ − PΣP||F2 Σ
Σ
s.t. Σ  0
(8) 7
(7)
Problem Statement

Hypothesis: The first and second sample statistical moments of


a dataset can be accurately estimated directly from
low-dimensional random projections and they can be used to
improve the reconstruction of the high dimensional data using
compressive spectral imaging.
General objective: To design and optimize an algorithm to
retrieve sample statistics from compressive measurements and
to analyze the use of the sample statistics to reconstruct the
underlying signal using compressive sensing theory.

8
Specific objectives
1. To determine the most suitable sensing/projection protocols
based on compressive sensing and random projections from the
state-of-the-art applicable to hyperspectral imaging to be used
in the statistics recovery.
2. To design an algorithm based on the gradient descent
method to recover the first and second sample statistical
moments from low-dimensional random projections.
3. To test the performance of the proposed algorithm to recover
the sample statistics in hyperspectral imaging.
4. To adapt a state-of-the-art algorithm to estimate the
vegetation cover using sample statistics and random
low-dimensional projections of hyperspectral images based on
the proposed approach.
5. To verify the performance of the adapted algorithm
9
Proposal

Lets split the dataset X into p subsets as


Xi  {x j | j ∈ S i ⊂ Q, S i ∩ S k  ∅, ∀i , k}. The covariance matrix
of each matrix Ci  Xi XTi /|S i |. Lets project each matrix Xi as
Yi  PTi Xi + Ni , (9)
Using this splitting procedure it can be reformulated as
C∗i  arg min ||Σ̃i − PTi Ci Pi ||F2 + τψ(Ci )
Ci (10)
subject to Ci  0,

However, note that if |S i | is big enough we can assume that

C1 ≈ C2 ≈ · · · ≈ Cp ≈ Σ, (11)
10
Proposed optimization problem

Thus, instead of recovering each covariance matrix Ci it is


desirable to recover the sample covariance
p
Õ
Σ∗  arg min || Σ̃i − PTi ΣPi ||F2 + τψ(Σ)
Σ i1 (12)
subject to Σ  0.

11
Projected gradient descent method

The Projected gradient descent is used for problems described


by
Σ∗  arg min f (Σ)
Σ (13)
subject to Σ ∈ D,
Thus, proposed algorithm is described by

where PD is the projection onto the set of positive


semi-definitive matrices 12
Projected gradient descent method

The gradient of the function is given by


p
Õ
∇ f (Σ)  Pi (Σ̃i − PTi ΣPi )PTi + τ∇ψ(Σ), (14)
i1

when ψ(Σ)  Tr(Σ), its gradient is ∇Tr(Σ)  I.


the parameter λ k is picked as λ k  λ k−1 /η r , where constants
η > 1, λ−1 > 0 and r is the smallest positive integer (including
0) that satisfies

min f (Σk r ) ≤ f (Σk ) + trace(∇ f (Σk )T (Σk r ))


r
(15)
+||Σk r − Σk ||F2

where Σk r  PD (Σk − (λ k−1 /η r )∇ f (Σk )).


13
Convergence analysis

p
Õ

Σ  arg min || Σ̃i − PTi ΣPi ||F2 + τψ(Σ)
Σ i1 (16)
subject to Σ  0.
Consider the vector form of the problem
p
Õ
g(Σ)  ||Qi vec(Σ) − vec(Σ̃i )||22 + τdT vec(Σ), (17)
i1

with Qi  PTi ⊗ PTi , ⊗ is the Kronecker product. Thus, note that


problem (17) is equivalent to
Σ∗  arg min g(Σ) + h(Σ) (18)
Σ

where h(Σ) is a indicator function of the positive semi-definitive


set
14
Convergence theorem

Theorem
Suppose that g(Σ) and h(Σ) are a proper closed and convex functions, ad-
ditionally, dom(h(Σ)) ⊆ int(dom(g(Σ))) and that g(Σ) is L-smooth. Let
{Σk }k≥0 be the sequence of points generated by the projected gradient algo-
rithm. Then for any optimal point Σ∗ and k ≥ 0

||Σk+1 − Σ∗ || ≤ ||Σk − Σ∗ || (19)

Trivially, it can be seen that (18) is convex and differentiable,


thus first part of the assumptions of theorem 1 are accomplish

15
L-smooth function
A function g is said to be L-smooth if it is differentiable and it
holds that exist a L > 0 such that
||∇g(x) − ∇g(y)||2 ≤ L||x − y||2 , (20)
for all x, y ∈ E, with E the domain of function g. Thus, for
function g(Σ) defined in 17,
p
Õ
∇g(x)  QTi (Qi x − σ̃) + τd, (21)
i1
then, by replacing (21) in (20)
p
Õ p
Õ
||( QTi (Qi x − σ̃) + τd) − ( QTi (Qi x − σ̃) + τd)||2 , (22)
i1 i1
after some algebra
p
Õ p
Õ
|| (QTi Qi )(x − y)||2 ≤ || (QTi Qi )|| ||x − y||2 . (23)
i1 i1
Íp
Thus, L  || i1
(QTi Qi )||. 16
Bias of the estimator

We assume that
C1 ≈ C2 ≈ · · · ≈ Cp ≈ Σ, (24)
this adds a bias to the estimator, for that let define
Ci  Σ + Ri , (25)
where R ∈ Rl×l is an error matrix. Thus, the bias is explained
by lemma 2
Lemma
The gradient step for the proposed Algorithm 1 is biased by Bias[∇ f˜(Σ)] 
Íp
− i1 Pi PTi Ri Pi PTi .

Using (25) the gradient is re-written as


p
Õ
∇ f˜(Σ)  Pi (Σ̃i − PTi Ci Pi )PTi + τ∇ψ(Σ), (26)
i1
17
Bias of the proposed estimator

then by plugging (25) into (26) gives


p
Õ
∇ f˜(Σ)  Pi (Σ̃i − PTi (Σ + Ri )Pi )PTi + τ∇ψ(Σ), (27)
i1

finally, after some algebra


p
Õ p
Õ
∇ f˜(Σ)  Pi (Σ̃i − PTi ΣPi )PTi − Pi PTi Ri Pi PTi
i1 i1 (28)
+τ∇ψ(Σ),

18
Mitigating the Bias

Since we have that


p

Ri  0, (29)
p
i1

we used a filtered gradient to mitigate the Bias

∇ fˆ(Σ)  D ∗ ∇ f˜(Σ), (30)

where ∗ represents the convolution operation, and D ∈ Rk×k is


the filter.

19
Hyperspectral images used

Figure: Urban dataset. (Left) 2D spatial distribution of the 100th spectral band. (Right)
Three spectral signatures at different pixels of the image.

Figure: Pavia centre dataset. (Left) 2D spatial distribution of the 80th spectral band.
(Right) Three spectral signatures at different pixels of the image. 20
Sensing matrices

For the sensing, three different types of random matrices were


used.

# Normal distribution

P where Pi, j ∼ N(0, 1)

# Uniform distribution

P where Pi, j ∼ U(0, 1)

# Bernoulli distribution

P where Pi, j ∼ B(1, 1/3)

21
Determining the number of subsets

Figure: Mean square error of the reconstructed covariance matrix for Pavia (left) and
Urban (right) images by varying the number of subsets.

Base don this graphic we set the number of partitions p  32.

22
Accuracy of the recovered eigenvectors of the covariance matrix
Urban Pavia

Binary
Gaussian
Uniform

Figure: Mean MSE error of recovered covariance matrix when the compression ratio
vary for Urban image with 32 subsets.

23
Angle of reconstructed eigenvectors
Average error angle between eigenvectors and their reconstruction
1 ev
2 ev
3 ev Binary Gaussian Uniform

Figure: Mean angle error of recovered eigenvectors when the compression ratio vary
for Pavia image with 32 subsets. 24
Angle of reconstructed eigenvectors
Average error angle between eigenvectors and their reconstruction
1 ev
2 ev
3 ev Binary Gaussian Uniform

Figure: Mean angle error of recovered eigenvectors when the compression ratio vary
for Urban image with 32 subsets.
25
Bias and filtered gradient analysis

Figure: Comparison of the fourth eigenvector of the estimated covariance matrix and
26
the bias term with and without the filtering.
Conclusions
# We proposed an algorithm to recover the covariance matrix
from a set of compressive measurements using the
projected gradient strategy.
# Convergence is proven to this algorithm.
# The theoretical results show that a filtered gradient can
reduce the Bias.
# Experimental results show that the proposed method
outperforms the state-of-art methods.

27
Thanks

Questions?

28
Sensing matrices
How matrices are built?
k k

Codification
snapshot 1 λ

Corresponding ...
sensing matrix

P1 k P2 k P8

Codification
snapshot 2 λ

X 1
Partitions

Are covariances matrices similar each other?

2
Eigenvectors comparison
Are covariances matrices similar each other?

3
Compressive projection principal component analysis (CPPCA)
CPPCA estimates both the PCA coefficients and the most
relevant eigenvectors of the covariance matrix from an
orthogonal random projection of the data.

P2x

P1x

CPPCA recovers the eigenvectors by


J
(i) 1 Õ (j) (j)T (i−1)
w  Q Q w (31)
J
j1 4
Compressive projection principal component analysis

Drawbacks of CPPCA

# Matrix P have to be orthornormal.


# CPPCA assume an eccentric behavior on the eigenvalues.
# it requires Pi PTi w to be known (which also depends on
eccentricity).

You might also like