Identifying and Visualising Cluster Structure in High-Dimensional Data

gexcolora
Identifying and Visualising Cluster Structure in

High-Dimensional Data
Supervisor: Nicos Pavlidis

Michael Casey, Joel Dyer, Matthews Sejeso,
Yu Tian, Huining Yang, Yang Zhou
University of Oxford
5th April 2019
CCTV (University of Oxford) Cluster Structure 5th April 2019 1 / 31

Outline
Outline
Motivation
K-subspace algorithm
An `1 -minimization problem
Spectral clustering
Results
Future work

Introduction
Description of Data & Problem

38 people
64 grayscale images I ∈ {0, . . . , 255}192×168 of each person (i.e.
192 × 168 pixels)
Each image of each person taken under different lighting conditions
Unsupervised learning of data: assign correct individual to each image
K-means clusters based on illumination: need a different approach
Figure 1: Illustration of difference in lighting conditions.

Alternative to k-means
An alternative to Euclidean-based methods
Flatten image I ∈ R192×168 into vector v ∈ R32256 according to the

transformation
k = 168(i − 1) + j (1)
Governing idea: after dimensionality reduction, faces might lie in
distinct linear subspaces of the full space R32256
If so, can exploit clustering methods that seek to partition data into
distinct linear subspaces

Linear Subspace
Validating partitioning hypothesis

To verify that approaches based on partitioning the full space R32256
into distinct linear subspaces corresponding to each individual are
feasible, we cheat and use our knowledge of the correct labels to
obtain subspaces
PCA on each individual’s full set of images to obtain d principal
components {qji }di=1 for every person j
Construct the respective subspaces Vj = {qj1 , . . . , qjd }
Compute principal angles between every pair of subspaces to verify
distinctness
Principal angles between subspaces U and W:
|hu, w i|
θ1 = min{ arccos : u ∈ U, w ∈ W}, (2)
u,w kuk · kw k
|hu, w i|
θj = min{ arccos : u ∈ U, w ∈ W, (3)
u,w kuk · kw k
hu, ui i = 0, hw , wi i = 0 ∀i < j}, j = 2, . . . .d.
Linear Subspace
Linear Subspace: Validation

Histogram of the principal angles over all subspaces

Linear Subspace
The pair of individuals with the smallest smallest principal angle

Linear Subspace
The pair of individuals with the largest smallest principal angle

Linear Subspace
The pair of individuals with the smallest sum of principal angles.

Linear Subspace
The pair of individuals with the largest sum of principal angles.

K-subspace - Objective
Find the best k distinct sub-spaces that minimise the “projection

error” of each data point.
N
X
min min kyi − Uj UjT yi k22 (4)
U1 , U2 ,...,Uk j∈{1,...,k}
i=1
where
k - the number of clusters (people),
N - the total number of data (images),
yi - image i
Uj = [uj1 , . . . , ujd ] - the matrix of basis for j subspace,
d - the dimension of each subspace (we choose).

Algorithm
Feed in all the data points (yi )i=1,...,N , and update each (Uj )j=1,...,k after
each iteration until convergence.
1 Randomly initialise k clusters;

2 Allocate each yi to the ”nearest” subspace;
3 Update the d basis of the subspace by PCA;
4 Back to step 2 until convergence.
Initialisation is vital to the process: depending on initialisation, accuracy
varies between 55% - 65% for the two most dissimilar individuals.

Sparse Subspace Clustering
`1 Minimization
Objective: find the sparse solution vectors ci
min kci k1 s.t. yi = Yci , cii = 0,

ci
where ci = [ci1 , ci2 , ..., ciN ]T and cii = 0 eliminates the trivial solution.
The problem can be reformulated as LASSO
λ
min kci k1 + kyi − Yci k22 , cii = 0
ci 2
where λ is the regularization parameter. For λ < kYT yi k∞ , we have

ci = 0.
Similarity matrix W = |C| + |C|T , where C = [c1 , . . . , cN ].
Spectral Clustering
Graph?
nodes, edges
adjacency matrix A, Aij = 1 if there is an edge connecting node i and
node j, and 0 otherwise.
weighted adjacency matrix W , Wij = ωij weights which represent how
tightly node i and node j are connected.
data as a graph, and define the relationship by weights?
Spectral Clustering
Spectral Clustering
Consider the problem from ”Graph cut” point of view
image / data → node in Graph

clustering → Graph cut / community detection
Then we need construct edges to achieve:

data in the same cluster have strong connection
data in the different cluster have no / weak connection.

Spectral Clustering
Spectral Clustering - Graph cut
Then with our constructed similarity Graph G (V , E ) with weights W ,

our goal is to minimise weights between clusters:
k
1X X
min cut(A1 , ..., Ak ) := ωij , (5)
A1 ,...,Ak 2
i=1 i∈Ai ,j∈Āi
where {Ai } are subsets of V such that Ai ∩ Aj = ∅ and

A1 ∪ ... ∪ Ak = V , and Āi = V \Ai .
To control the size of each set, we consider modify it
k P
1 X i∈Ai ,j∈Āi ωij
Ratiocut(A1 , ..., Ak ) := . (6)
2 |Ai |
i=1
Relax it to standard trace minimisation, then use eigenvectors of

corresponding Laplacian matrix to solve this problem.
Result: `1 minimization
A Reminder - Most Distinct Individuals

Sparse coefficient matrix: the most distinct pair of individuals
Visualize the sparse matrix C obtained from L1 minimization
0 0
20 20
40 40
60 60
80 80
100 100
120 120
0 20 40 60 80 100 120 0 20 40 60 80 100 120

nz = 620 nz = 323
cvx opt in MATLAB Interior-point scheme

accuracy 96% accuracy 98%

2 errors from the interior-point scheme
index 21 index 21
error 1 error 2

A Reminder - Least Distinct Individuals

Sparse coefficient matrix: the least distinct pair of individuals
Visualize the sparse matrix C obtained from L1 minimization
0 0
20 20
40 40
60 60
80 80
100 100
120 120
0 20 40 60 80 100 120
0 20 40 60 80 100 120
nz = 862
nz = 350
cvx opt in MATLAB Interior-point scheme

accuracy 51% accuracy 62%

Result: More than two individuals / All together

now
Visualize the mean image for 8 individuals

Result: More than two individuals

Visualize the mean image for 8 clusters (83% Accuracy)

Summary
Summary
Can successfully distinguish the most distinct individuals with near

perfect accuracy.
Can only distinguish between the most similar individuals using the
interior points algorithm/
Can successfully recover individuals when sorting more than two.
Most mis-classification in the 8 individuals test are a result of
clustering darkest images with a given individual.

Future work
Future work
Comparison of different methods (e.g. different similarity matrices for

spectral clustering)
Further noise filtration: outliers
Check for robustness of methods when images have backgrounds.
Partial matching, different angles, expressions, etc.
Extend to RGB, thermal images
Occlusions & accessories

References
References
Sparse Subspace Clustering: Algorithm, Theory, and Applications. E.

Elhamifar and R. Vidal, 2013.
On Spectral Clustering: Analysis and an Algorithm. A. Ng, M.Jordan
and Y. Weiss, 2002.
A tutorial on Spectral Clustering. U. Luxbulg, 2007.
An Interior-Point Method for Large-Scale l1-Regularized Least
Squares. S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevshy,
2007.

End of Presentation
Thank You.

Appendix
Spectral Clustering - similarity matrix
To model the local relationship between data points, directly

- neighbourhood graph – connect all pairwise distances which are
smaller than .
k - nearest neighbour graph – connect the k nearest neighbours for
each node
Or, we can now construct our own similarity matrix S ∈ R N×N ,
fully connected graph – simply use the positive similarity as the
weights of the corresponding edges, e.g.
s(xi , xj ) = exp(−||xi − xj ||2 /(2σ 2 ))
We prefer the fully connected graph, which leaves us more freedom to
describe the data, but how to better construct the similarity matrix? later
...

Appendix
Spectral Clustering - Reformulation

We want to connect
Pour objective to Graph Laplacian
L = D − W , dii = j ωij , then more algebra ...
construct H = [h1 , ..., hk ], where hi,j = 1/|Ai | if i ∈ Ai and 0
otherwise.
then hi0 Lhi = i∈Ai ,j∈Āi ωij /|Ai | = (H 0 LH)ii , hence
P
RatioCut(A1 , ..., Ak ) = Tr (H 0 LH).
also observe H 0 H = I
Hence we can formulate our problem as
min Tr (H 0 LH)
A1 ,...,Ak
s.t. H 0 H = I ,H defined as above. (7)

Appendix
Spectral Clustering - Relaxation
But the problem is now discrete optimisation (NP hard), so for pratical
reason we consider relax it as
min Tr (H 0 LH)
H∈RN×k
s.t. H 0H = I . (8)
This is then a standard trace minimisation problem and by Rayleigh-Ritz

theorem, the solution is given by choosing H as the matrix containing k
smallest eigenvectors of L as columns.

Appendix
Spectral Clustering - Reconvertion
However, in order to obtain a partition of the graph, we need to

re-transform the solution of the relaxed problem H into a discrete indicator.
Recall how we construct H = [h1 , ..., hk ], hi,j = 1/|Ai | if i ∈ Ai and 0

otherwise.
Hence the simplest way is to partition the nodes/data by the value in each
column. However, this method turns out to be too simple, and we instead
use K-means clustering algorithm to rows of H and finally get the partition.

Identifying and Visualising Cluster Structure in High-Dimensional Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Identifying and Visualising Cluster Structure in High-Dimensional Data

Uploaded by

Copyright:

Available Formats

gexcolora

Identifying and Visualising Cluster Structure in

Supervisor: Nicos Pavlidis

5th April 2019

CCTV (University of Oxford) Cluster Structure 5th April 2019 1 / 31

CCTV (University of Oxford) Cluster Structure 5th April 2019 2 / 31

Description of Data & Problem

Figure 1: Illustration of difference in lighting conditions.

An alternative to Euclidean-based methods

Flatten image I ∈ R192×168 into vector v ∈ R32256 according to the

CCTV (University of Oxford) Cluster Structure 5th April 2019 4 / 31

Validating partitioning hypothesis

Linear Subspace: Validation

CCTV (University of Oxford) Cluster Structure 5th April 2019 6 / 31

Linear Subspace: Validation

The pair of individuals with the smallest smallest principal angle

CCTV (University of Oxford) Cluster Structure 5th April 2019 7 / 31

Linear Subspace: Validation

The pair of individuals with the largest smallest principal angle

CCTV (University of Oxford) Cluster Structure 5th April 2019 8 / 31

Linear Subspace: Validation

The pair of individuals with the smallest sum of principal angles.

CCTV (University of Oxford) Cluster Structure 5th April 2019 9 / 31

Linear Subspace: Validation

The pair of individuals with the largest sum of principal angles.

CCTV (University of Oxford) Cluster Structure 5th April 2019 10 / 31

Find the best k distinct sub-spaces that minimise the “projection

CCTV (University of Oxford) Cluster Structure 5th April 2019 11 / 31

1 Randomly initialise k clusters;

CCTV (University of Oxford) Cluster Structure 5th April 2019 12 / 31

Objective: find the sparse solution vectors ci

min kci k1 s.t. yi = Yci , cii = 0,

where λ is the regularization parameter. For λ < kYT yi k∞ , we have

Consider the problem from ”Graph cut” point of view

image / data → node in Graph

Then we need construct edges to achieve:

CCTV (University of Oxford) Cluster Structure 5th April 2019 15 / 31

Spectral Clustering - Graph cut

Then with our constructed similarity Graph G (V , E ) with weights W ,

where {Ai } are subsets of V such that Ai ∩ Aj = ∅ and

Relax it to standard trace minimisation, then use eigenvectors of

A Reminder - Most Distinct Individuals

CCTV (University of Oxford) Cluster Structure 5th April 2019 17 / 31

Visualize the sparse matrix C obtained from L1 minimization

0 20 40 60 80 100 120 0 20 40 60 80 100 120

cvx opt in MATLAB Interior-point scheme

CCTV (University of Oxford) Cluster Structure 5th April 2019 18 / 31

2 errors from the interior-point scheme

CCTV (University of Oxford) Cluster Structure 5th April 2019 19 / 31

A Reminder - Least Distinct Individuals

CCTV (University of Oxford) Cluster Structure 5th April 2019 20 / 31

Visualize the sparse matrix C obtained from L1 minimization

cvx opt in MATLAB Interior-point scheme

CCTV (University of Oxford) Cluster Structure 5th April 2019 21 / 31

Result: More than two individuals / All together

CCTV (University of Oxford) Cluster Structure 5th April 2019 22 / 31

Result: More than two individuals

CCTV (University of Oxford) Cluster Structure 5th April 2019 23 / 31

Can successfully distinguish the most distinct individuals with near

CCTV (University of Oxford) Cluster Structure 5th April 2019 24 / 31

Comparison of different methods (e.g. different similarity matrices for

CCTV (University of Oxford) Cluster Structure 5th April 2019 25 / 31