You are on page 1of 31

gexcolora

Identifying and Visualising Cluster Structure in


High-Dimensional Data

Supervisor: Nicos Pavlidis


Michael Casey, Joel Dyer, Matthews Sejeso,
Yu Tian, Huining Yang, Yang Zhou

University of Oxford

5th April 2019

CCTV (University of Oxford) Cluster Structure 5th April 2019 1 / 31


Outline

Outline

Motivation
K-subspace algorithm
An `1 -minimization problem
Spectral clustering
Results
Future work

CCTV (University of Oxford) Cluster Structure 5th April 2019 2 / 31


Introduction

Description of Data & Problem


38 people
64 grayscale images I ∈ {0, . . . , 255}192×168 of each person (i.e.
192 × 168 pixels)
Each image of each person taken under different lighting conditions
Unsupervised learning of data: assign correct individual to each image
K-means clusters based on illumination: need a different approach

Figure 1: Illustration of difference in lighting conditions.


CCTV (University of Oxford) Cluster Structure 5th April 2019 3 / 31
Alternative to k-means

An alternative to Euclidean-based methods

Flatten image I ∈ R192×168 into vector v ∈ R32256 according to the


transformation
k = 168(i − 1) + j (1)
Governing idea: after dimensionality reduction, faces might lie in
distinct linear subspaces of the full space R32256
If so, can exploit clustering methods that seek to partition data into
distinct linear subspaces

CCTV (University of Oxford) Cluster Structure 5th April 2019 4 / 31


Linear Subspace

Validating partitioning hypothesis


To verify that approaches based on partitioning the full space R32256
into distinct linear subspaces corresponding to each individual are
feasible, we cheat and use our knowledge of the correct labels to
obtain subspaces
PCA on each individual’s full set of images to obtain d principal
components {qji }di=1 for every person j
Construct the respective subspaces Vj = {qj1 , . . . , qjd }
Compute principal angles between every pair of subspaces to verify
distinctness
Principal angles between subspaces U and W:
|hu, w i|
θ1 = min{ arccos : u ∈ U, w ∈ W}, (2)
u,w kuk · kw k
|hu, w i|
θj = min{ arccos : u ∈ U, w ∈ W, (3)
u,w kuk · kw k
hu, ui i = 0, hw , wi i = 0 ∀i < j}, j = 2, . . . .d.
CCTV (University of Oxford) Cluster Structure 5th April 2019 5 / 31
Linear Subspace

Linear Subspace: Validation


Histogram of the principal angles over all subspaces

CCTV (University of Oxford) Cluster Structure 5th April 2019 6 / 31


Linear Subspace

Linear Subspace: Validation

The pair of individuals with the smallest smallest principal angle

CCTV (University of Oxford) Cluster Structure 5th April 2019 7 / 31


Linear Subspace

Linear Subspace: Validation

The pair of individuals with the largest smallest principal angle

CCTV (University of Oxford) Cluster Structure 5th April 2019 8 / 31


Linear Subspace

Linear Subspace: Validation

The pair of individuals with the smallest sum of principal angles.

CCTV (University of Oxford) Cluster Structure 5th April 2019 9 / 31


Linear Subspace

Linear Subspace: Validation

The pair of individuals with the largest sum of principal angles.

CCTV (University of Oxford) Cluster Structure 5th April 2019 10 / 31


K-subspace algorithm

K-subspace - Objective

Find the best k distinct sub-spaces that minimise the “projection


error” of each data point.

N
X
min min kyi − Uj UjT yi k22 (4)
U1 , U2 ,...,Uk j∈{1,...,k}
i=1

where
k - the number of clusters (people),
N - the total number of data (images),
yi - image i
Uj = [uj1 , . . . , ujd ] - the matrix of basis for j subspace,
d - the dimension of each subspace (we choose).

CCTV (University of Oxford) Cluster Structure 5th April 2019 11 / 31


K-subspace algorithm

K-subspace algorithm

Algorithm
Feed in all the data points (yi )i=1,...,N , and update each (Uj )j=1,...,k after
each iteration until convergence.

1 Randomly initialise k clusters;


2 Allocate each yi to the ”nearest” subspace;
3 Update the d basis of the subspace by PCA;
4 Back to step 2 until convergence.
Initialisation is vital to the process: depending on initialisation, accuracy
varies between 55% - 65% for the two most dissimilar individuals.

CCTV (University of Oxford) Cluster Structure 5th April 2019 12 / 31


Sparse Subspace Clustering

`1 Minimization

Objective: find the sparse solution vectors ci

min kci k1 s.t. yi = Yci , cii = 0,


ci

where ci = [ci1 , ci2 , ..., ciN ]T and cii = 0 eliminates the trivial solution.
The problem can be reformulated as LASSO
λ
min kci k1 + kyi − Yci k22 , cii = 0
ci 2

where λ is the regularization parameter. For λ < kYT yi k∞ , we have


ci = 0.
Similarity matrix W = |C| + |C|T , where C = [c1 , . . . , cN ].
CCTV (University of Oxford) Cluster Structure 5th April 2019 13 / 31
Spectral Clustering

Graph?

nodes, edges
adjacency matrix A, Aij = 1 if there is an edge connecting node i and
node j, and 0 otherwise.
weighted adjacency matrix W , Wij = ωij weights which represent how
tightly node i and node j are connected.
data as a graph, and define the relationship by weights?
CCTV (University of Oxford) Cluster Structure 5th April 2019 14 / 31
Spectral Clustering

Spectral Clustering

Consider the problem from ”Graph cut” point of view

image / data → node in Graph


clustering → Graph cut / community detection

Then we need construct edges to achieve:


data in the same cluster have strong connection
data in the different cluster have no / weak connection.

CCTV (University of Oxford) Cluster Structure 5th April 2019 15 / 31


Spectral Clustering

Spectral Clustering - Graph cut

Then with our constructed similarity Graph G (V , E ) with weights W ,


our goal is to minimise weights between clusters:
k
1X X
min cut(A1 , ..., Ak ) := ωij , (5)
A1 ,...,Ak 2
i=1 i∈Ai ,j∈Āi

where {Ai } are subsets of V such that Ai ∩ Aj = ∅ and


A1 ∪ ... ∪ Ak = V , and Āi = V \Ai .
To control the size of each set, we consider modify it
k P
1 X i∈Ai ,j∈Āi ωij
Ratiocut(A1 , ..., Ak ) := . (6)
2 |Ai |
i=1

Relax it to standard trace minimisation, then use eigenvectors of


corresponding Laplacian matrix to solve this problem.
CCTV (University of Oxford) Cluster Structure 5th April 2019 16 / 31
Result: `1 minimization

A Reminder - Most Distinct Individuals

CCTV (University of Oxford) Cluster Structure 5th April 2019 17 / 31


Result: `1 minimization

Result: `1 minimization
Sparse coefficient matrix: the most distinct pair of individuals

Visualize the sparse matrix C obtained from L1 minimization

0 0

20 20

40 40

60 60

80 80

100 100

120 120

0 20 40 60 80 100 120 0 20 40 60 80 100 120


nz = 620 nz = 323

cvx opt in MATLAB Interior-point scheme


accuracy 96% accuracy 98%

CCTV (University of Oxford) Cluster Structure 5th April 2019 18 / 31


Result: `1 minimization

Result: `1 minimization

2 errors from the interior-point scheme

index 21 index 21
error 1 error 2

CCTV (University of Oxford) Cluster Structure 5th April 2019 19 / 31


Result: `1 minimization

A Reminder - Least Distinct Individuals

CCTV (University of Oxford) Cluster Structure 5th April 2019 20 / 31


Result: `1 minimization

Result: `1 minimization
Sparse coefficient matrix: the least distinct pair of individuals

Visualize the sparse matrix C obtained from L1 minimization

0 0

20 20

40 40

60 60

80 80

100 100

120 120

0 20 40 60 80 100 120
0 20 40 60 80 100 120
nz = 862
nz = 350

cvx opt in MATLAB Interior-point scheme


accuracy 51% accuracy 62%

CCTV (University of Oxford) Cluster Structure 5th April 2019 21 / 31


Result: `1 minimization

Result: More than two individuals / All together


now
Visualize the mean image for 8 individuals

CCTV (University of Oxford) Cluster Structure 5th April 2019 22 / 31


Result: `1 minimization

Result: More than two individuals


Visualize the mean image for 8 clusters (83% Accuracy)

CCTV (University of Oxford) Cluster Structure 5th April 2019 23 / 31


Summary

Summary

Can successfully distinguish the most distinct individuals with near


perfect accuracy.
Can only distinguish between the most similar individuals using the
interior points algorithm/
Can successfully recover individuals when sorting more than two.
Most mis-classification in the 8 individuals test are a result of
clustering darkest images with a given individual.

CCTV (University of Oxford) Cluster Structure 5th April 2019 24 / 31


Future work

Future work

Comparison of different methods (e.g. different similarity matrices for


spectral clustering)
Further noise filtration: outliers
Check for robustness of methods when images have backgrounds.
Partial matching, different angles, expressions, etc.
Extend to RGB, thermal images
Occlusions & accessories

CCTV (University of Oxford) Cluster Structure 5th April 2019 25 / 31


References

References

Sparse Subspace Clustering: Algorithm, Theory, and Applications. E.


Elhamifar and R. Vidal, 2013.
On Spectral Clustering: Analysis and an Algorithm. A. Ng, M.Jordan
and Y. Weiss, 2002.
A tutorial on Spectral Clustering. U. Luxbulg, 2007.
An Interior-Point Method for Large-Scale l1-Regularized Least
Squares. S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevshy,
2007.

CCTV (University of Oxford) Cluster Structure 5th April 2019 26 / 31


End of Presentation

Thank You.

CCTV (University of Oxford) Cluster Structure 5th April 2019 27 / 31


Appendix

Spectral Clustering - similarity matrix

To model the local relationship between data points, directly


 - neighbourhood graph – connect all pairwise distances which are
smaller than .
k - nearest neighbour graph – connect the k nearest neighbours for
each node
Or, we can now construct our own similarity matrix S ∈ R N×N ,
fully connected graph – simply use the positive similarity as the
weights of the corresponding edges, e.g.
s(xi , xj ) = exp(−||xi − xj ||2 /(2σ 2 ))
We prefer the fully connected graph, which leaves us more freedom to
describe the data, but how to better construct the similarity matrix? later
...

CCTV (University of Oxford) Cluster Structure 5th April 2019 28 / 31


Appendix

Spectral Clustering - Reformulation


We want to connect
Pour objective to Graph Laplacian
L = D − W , dii = j ωij , then more algebra ...
construct H = [h1 , ..., hk ], where hi,j = 1/|Ai | if i ∈ Ai and 0
otherwise.
then hi0 Lhi = i∈Ai ,j∈Āi ωij /|Ai | = (H 0 LH)ii , hence
P
RatioCut(A1 , ..., Ak ) = Tr (H 0 LH).
also observe H 0 H = I
Hence we can formulate our problem as

min Tr (H 0 LH)
A1 ,...,Ak

s.t. H 0 H = I ,H defined as above. (7)

CCTV (University of Oxford) Cluster Structure 5th April 2019 29 / 31


Appendix

Spectral Clustering - Relaxation

But the problem is now discrete optimisation (NP hard), so for pratical
reason we consider relax it as

min Tr (H 0 LH)
H∈RN×k
s.t. H 0H = I . (8)

This is then a standard trace minimisation problem and by Rayleigh-Ritz


theorem, the solution is given by choosing H as the matrix containing k
smallest eigenvectors of L as columns.

CCTV (University of Oxford) Cluster Structure 5th April 2019 30 / 31


Appendix

Spectral Clustering - Reconvertion

However, in order to obtain a partition of the graph, we need to


re-transform the solution of the relaxed problem H into a discrete indicator.

Recall how we construct H = [h1 , ..., hk ], hi,j = 1/|Ai | if i ∈ Ai and 0


otherwise.

Hence the simplest way is to partition the nodes/data by the value in each
column. However, this method turns out to be too simple, and we instead
use K-means clustering algorithm to rows of H and finally get the partition.

CCTV (University of Oxford) Cluster Structure 5th April 2019 31 / 31

You might also like