Dimensionality Reduction and Data Visualization PDF

Dimensionality reduction and data visualization
Gilles Gasso
INSA Rouen - ASI Departement

Laboratory LITIS
September 12, 2019
Gilles Gasso Dimensionality reduction and data visualization 1 / 27

Introduction
Introduction
Supervised learning (predictive methods)

Develop predictive models using labeled
training data
Make sure these patterns work on future
data (test data)
Unsupervised learning (descriptive methods) K=6
Data analysis
Study data’s distribution or geometry
Goal : acquire or extract knowledge /
patterns from data
→ Reduction of dimension, visualization,
clustering

Data exploration
d variables
rcc wcc hc hg ferr bmi ssf pcBfat lbm ht wt sex

4.82 7.6 43.2 14.4 58 22.37 50 11.64 53.11 163.9 60.1 f
4.32 6.8 40.6 13.7 46 17.54 54.6 12.16 46.12 173 52.5 f
5.16 7.2 44.3 Data 88matrix
14.5 18.29 61.9 12.92 48.76 175 56 f
4.66 6.4 40.9 13.9 109 18.37 38.2 8.45 41.93 157.9
> 45.8 f
4.19 9 39 13.4 sample
69 18.93xi =43.5 xi,110.16 · ·42.95
· xi,d158.9 47.8 f
4.53 5 40.7 14 41 17.79
 56.8 12.55 38.3  156.9 43.8 f
20.31x1,158.9 . . .
13.46 x1,d
4.42 6.4 42.8 14.5 63 39.03 149 45.1 f
n
4.32 4.3 41.6 14 177 26.73 . 35.2 6.46 . 91  190.4 n×d
96.9 m
4.73 6.7 42.8 14.9
X8 = 
 .
19.81 . 41.8 7.19
.
. 70  ∈ R 75.5
195.2 m points
4.71 7.2 43.6 14 32 20.39
xn,130.5 . . .5.63 xn,d67 186.6 71 m
4.93 7.3 46.2 15.1 41 21.12 34 6.59 67 184.4 71.8 m
5.21 7.5 47.5 16.5 20 21.89 46.7 9.5 70 187.3 76.8 m
Point 5.09 8.9 46.3 15.4 44 29.97 71.1 13.97 88 185.1 102.7 m
5.11 9.6 48.2 16.7 103 27.39 65.9 11.66 83 185.5 94.2 m
xi 4.94 6.3 45.7 15.5 50 23.11 34.3 6.43 74 184.9 79 m
4.86 3.9 44.9 15.4 73 22.83 34.5 6.56 70 181 74.8 m
4.51 4.4 41.6 12.7 44 19.44 65.1 15.07 53.42 179.9 62.9 f
4.62 7.3 43.8 14.7 26 21.2 76.8 18.08 61.85 188.7 75.5 f
Variable j (hemoglobin)
What are the relation between the variables? How close are the samples?
Introduction
Dimension reduction: the goal

Let X ∈ RN×D the data (N samples of dimension d)
Goal: find a projection of X onto Z ∈ RN×q with q < d
d q
N X N Z
80
60 7 7 77
7777
27 7 7777 7777 7 9949
7777777777 777 7 7 77
7 7777 77 7 7777 7 77 7 444444
78 99999 9 77777
7 9 444
399 9 9999 999999 99 99 9999 4 44
44 999999 999 9 777 4 444 44 4
40
7
4 99 1 9799 7 777 444444444 4444 44
11
111111111 11 87 2
2 77779 9
9 9 77 77 47 7 4444444 4
11111111111
2 111 1 99999 7 79 449444 444 44
11 111 1 7 59 999 44 44494 99 9
1111 11 2 4 77 999 2 77 44
2 1111111711111 11121 3 2 7 7 9 9 9 999 4
20 8 66 1 11111111 14 23372 44 4 9 9 44444
1 11117 4 93 99 8 9 9944 9 44
22 22 2 11111 1 1
1 7 2
2 2 7 49 994 444444444
5 4
2222 222 1 111 1111112 6
111 222 2 2222 75 59 9 4 9
2 22 8 81 11 666 2 77 9
604
5 115 26 2 88
0 8 8 8 8 12 55555
8 88 888888 888 8888 33 5555 55
5
5
88 8 88 8
88 8 8 93 13 5555 555 55 6663 2 6
888888888 88 88 8 888 5588 99
5 5555 55 6 666660 6666
88888888888888 88 88 8 883 88888 85 555 4 66
8 2 8 8 8 55 3535 555 5 666666666 6
33
33 35333 85 3
5 8 566 6666 6666 666 222 2
-20 3 8 5 3 3
55
5 55 5 6 6666666 496 666666 6 66666 2
2 8 3338 3 85 8 5 55 5 5 5 6 6 666 6 66666 666 22 2
3
3 3 3359 5 5 5 5 55 6 66666 22222222222
3 3333 5533 333 5 5 5555 55 5
5 66 66 66
66 22
2222222 222 23
333 33 3 3 3 33 5 5 53 0 2
5 2
3 333 3 3 3 3 5 2222 2222 2
-40 333333 333 55 00 0
3 33
33 333333333 55 0 0 000000 0
3 53 00 000000 00 0000 2
6 0 00 090 0 00 0000000000000
6 0 000 000 00
0 0000 0 0 00000000
-60 000000000000000000
0
-80
-60 -40 -20 0 20 40 60 80
d = 784 q=2
Introduction
What for?
Visualization (q = 2 ou 3)
check the data
identify outliers
visualize the data according to their categories (if labelled)
Data representation (q < d)

Noise reduction
pre-processing: computation issue
hidden structure in the data (example: manifolds)
Coding/Encoding scheme
cod : IRd −→ IRq , x 7−→ z = cod(x)
dec : IRq −→ IRd , z 7−→ x = dec(z)
How to assess the quality of the coding?

Introduction
Principle of dimension reduction methods

Project samples {x i ∈ Rd }N q N
i=1 onto {z i ∈ R }i=1 (q < d) such that the
data topology is preserved
preserve distance between samples
preserve the neighborhood . . .
x i ∈ IR3
z i ∈ IR2 : distance
preservation
Methods we will study

Linear : PCA, non-linear : SNE and t-SNE variant
Introduction Principal Component Analysis: PCA
Principal Component Analysis (PCA)

Model: data = information + noise
X = Z P> + B
cod : IRd −→ IRq , x 7−→ z = P >x
Linear orthogonal projection:
dec : IRq −→ IRd , z 7−→ x̂ = Pz
Property: columns of P are orthogonal

 > 
z>
 
x1 1
Dimensions: X =  ...  ∈ IRN×d , Z =  ..  ∈ IRN×q , P ∈ IRd×q
  
. 
x>
N z>N
Objective: minimize reconstruction error between the x i and x̂ i = dec(cod(x i ))

N
X
min kx i − PP > x i k2
P∈IRd×q
i=1

Another view of PCA
PCA linearly projects {x i ∈ Rd }N

i=1 onto a subs-space of dimension
q (q < d) such that the variance of the projections
{z i = P > x i ∈ Rq }N
i=1 remains maximal
Variance maximization (case q = 1)

max kX pk22 with kpk22 = 1 and Z = Xp
p∈IRq
Minimization of error / maximization of variance
N
X N
X
J(P) = kx i − x̂ i k2 = (x i − PP > x i )> (x i − PP > x i )
i=1 i=1
N
X
> > >
= (x > > >
i x i − 2x i PP x i + x i PP PP x i )
i=1
N
X N
X N
X
>
= x>
i xi − x> >
i PP x i = x i x i − z>
i zi
i=1 i=1 i=1
N N
! N N
!
X 1 X X X
>
= trace xix>
i − ziz>
i = trace > >
xixi − P xixi P
N
i=1 i=1 i=1 i=1

> > >
J(P) = trace X X − trace P X X P
⇒ min J(P) ⇔ maximizing the variance of the projections w.r.t. P

Learning the first projection vector p 1 of P
Data: {x i ∈ IRN×d }N
i=1
Assume the x i are

normalized (mean removal
and std deviation scaling)
Projections onto p1 :
{z i = p > N
1 x i ∈ IR}i=1
Computing p 1 ∈ IRd
We seek p 1 a unit vector such as to maximize the variance of the {z i }N
i=1
N
1 X > 2
max (p 1 x i ) s.t. kp 1 k2 = 1
p 1 ∈IR d N
i=1
−→ This turns into a constrained optimization problem

Computing p 1
max p 1 > C p 1 s.t. kp 1 k2 = 1
p 1 ∈IRd
PN >
C= 1
N i=1 xix>
i =
1
NX X is the correlation matrix
Solution derivation
The Lagrange associated: L(p 1 , λ1 ) = −p > >
1 C p1 + λ1 (p 1 p 1 − 1)
Optimality conditions :
∇p1 L = −2C p 1 + 2λ1 p 1 = 0 and ∇λ1 L = p >
1 p1 − 1 = 0
=⇒ C p 1 = λ1 p 1 and p>
1 C p 1 = λ1
1 (λ1 , p 1 ) is the couple (eigenvalue , eigenvector) of the C

2 p>
1 C p 1 = λ1 is what we seek to maximize
p 1 is the eigenvector associated to the highest eigenvalue of C .

Computing p 2 and beyond
we are looking for a unit vector p 2 orthogonal to p 1 that maximizes

the variance of the projections {p > N
2 x i }i=1 onto p2
Solution
p 2 is the eigenvector associated to λ2 , the 2nd highest eigenvalue of C
Lemma
The sub-space of size k that maximizes the variance of the projection
necessarily includes the sub-space of size k − 1.
Algorithm
PCA algorithm
xij −x̄j
1 Normalize the data : {x i ∈ Rd }N
i=1 −→ {xij = σj , j = 1, d}N
i=1
1 >
2 Compute the correlation matrix C = NX X
3 Find the eigenvalue decomposition {p j ∈ Rd , λj ∈ R}dj=1 of C

4 Order the eigenvalues λj by decreasing order
5 The projection matrix is:
P = (p 1 , · · · , pq ) ∈ Rd×q
{p 1 , · · · , p q } are the d eigenvectors associated to the q highest eigenvalues.

Algorithm Application
Application
rcc wcc hc hg ferr bmi ssf pcBfat lbm ht wt sex

4.82 7.6 43.2 14.4 58 22.37 50 11.64 53.11 163.9 60.1 f
4.32 6.8 40.6 13.7 46 17.54 54.6 12.16 46.12 173 52.5 f
5.16 7.2 44.3 14.5 88 18.29 61.9 12.92 48.76 175 56 f
4.66 6.4 40.9 13.9 109 18.37 38.2 8.45 41.93 157.9 45.8 f
4.19 9 39 13.4 69 18.93 43.5 10.16 42.95 158.9 47.8 f
4.53 5 40.7 14 41 17.79 56.8 12.55 38.3 156.9 43.8 f
4.42 6.4 42.8 14.5 63 20.31 58.9 13.46 39.03 149 45.1 f
4.32 4.3 41.6 14 177 26.73 35.2 6.46 91 190.4 96.9 m
4.73 6.7 42.8 14.9 8 19.81 41.8 7.19 70 195.2 75.5 m
4.71 7.2 43.6 14 32 20.39 30.5 5.63 67 186.6 71 m
4.93 7.3 46.2 15.1 41 21.12 34 6.59 67 184.4 71.8 m
5.21 7.5 47.5 16.5 20 21.89 46.7 9.5 70 187.3 76.8 m
5.09 8.9 46.3 15.4 44 29.97 71.1 13.97 88 185.1 102.7 m
5.11 9.6 48.2 16.7 103 27.39 65.9 11.66 83 185.5 94.2 m
4.94 6.3 45.7 15.5 50 23.11 34.3 6.43 74 184.9 79 m
4.86 3.9 44.9 15.4 73 22.83 34.5 6.56 70 181 74.8 m
4.51 4.4 41.6 12.7 44 19.44 65.1 15.07 53.42 179.9 62.9 f
4.62 7.3 43.8 14.7 26 21.2 76.8 18.08 61.85 188.7 75.5 f

Pair plots of the variables
Bivariate representation
→ some variables are linearly
dependent
→ use correlation
Correlation matrix
at
Bf
cc
m
r
c
bm
f
hg
pc
hc
t
ss
rc
ht
fe
w
lb
w
1
rcc 1 0.15 0.92 0.89 0.25 0.3 -0.4 -0.49 0.55 0.36 0.4
wcc 0.15 0.75

1 0.15 0.13 0.13 0.18 0.14 0.11 0.1 0.08 0.16
hc 0.92 0.15 1 0.95 0.26 0.32 -0.45 -0.53 0.58 0.37 0.42
0.5
hg 0.89 0.13 0.95 1 0.31 0.38 -0.44 -0.53 0.61 0.35 0.46
0.25
ferr 0.25 0.13 0.26 0.31 1 0.3 -0.11 -0.18 0.32 0.12 0.27
bmi 0.3 0.18 0.32 0.38 0.3 1 0.32 0.19 0.71 0.34 0.85 0
ssf -0.4 0.14 -0.45 -0.44 -0.11 0.32 1 0.96 -0.21 -0.07 0.15
-0.25
pcBfat -0.49 0.11 -0.53 -0.53 -0.18 0.19 0.96 1 -0.36 -0.19 0
-0.5
lbm 0.55 0.1 0.58 0.61 0.32 0.71 -0.21 -0.36 1 0.8 0.93
ht 0.36 0.08 0.37 0.35 0.12 0.34 -0.07 -0.19 0.8 1 0.78 -0.75
wt 0.4 0.16 0.42 0.46 0.27 0.85 0.15 0 0.93 0.78 1

-1
Perform data projection
Knowing P ∈ Rd×q , the mean x̄ and std σ

   
dimension q < d
p 1 > x̃
 
dimension q
dimension d
   
  normalize
x  −−−−−→
  Projection
x̃  −−−−−−→  .. 
  x̄ et σ   z=P > x̃  . 
    p k > x̃
−→ we achieve dimensionality reduction

Data visualization
Biplot ACP
5.0
ssf
pcBfat
bmi
wt
2.5 Fill.
f
ht
m
lbm
wcc
NULL
Axe 2
ferr 12.5
0.0 10.0
7.5
5.0
rcc hg
2.5
hc
-2.5
-3
Gilles Gasso 0 Dimensionality3reduction and data visualization
6 18 / 27
Choice of q, the projection space size

Variance expliquée par les axes
50
45.4%
40
Pourcentage variance
30
23.3%
20
10.5%
10
8.1%
7.2%
3.9%
1% 0.4% 0.2% 0%
0
1 2 3 4 5 6 7 8 9 10
Nombre d'axes
Cross-Validation
Detection of "an elbow" on the graph of eigenvalues
fixed percentage (for instance 95%) of the variance is kept
Visualizing Mnist dataset
d = 784 q=2
0.01
7
79 44 9
7
9 74 7794 7477 974 7779 44
9 7 9 44 94 4 7 447 9
7 994794947799 749 42
4 7 49 9 4 4 9 74
77 79779949 747 79974 9 4
7 7 977497 79999477 9 4 799 7 44 4 4
9 4
9 79 74 474 4 7 777 5 49 4 4
0.005 7 9774 4747 9 4 9
774 7 4 4 9 74 4 4 6
77 7 99747 4 79
97 9 44
4 4
4 4 947 6 4
9 44 6 66
49999 9749779 2 7 44 66 4 4
9 7 4 399 2 49997 27 49366447 2 647 3 5 9 6 6 8
77
97 9 4 4 4 5 68 4 6 668
797 474 996964 3 6 59 6646 66 6
9 49 4 4 4 7547 9 4 763 9 23 6
6 2 5 62
2
999 64 3128 5 6237 8 0
6 6
060 03
6
0 00
49 7 9 5
4 2
5
95 8 73 53 9 0 8 6 8
36 2 6805 5 02 66 00 0
97 5 2 3 6
1 0 5
9 6 7 9
7
375 5 3 2 4 5 66
836 3338 5 6535 0 2 0 6 2 9 0 0
0 1 19 7788554925356 55 3936 8825826 08 6
4266626 0 6 2200 0 60 0 05 5 0 0 0 0 0
1 91 16254 5
29 355 8 5 8
6 8 2 2 9 8 26 06 6 6 0 20 0 0 0
7 5358 56 58 6 656 6282 8026 26066 62226 20 005 0
1 111 7 48 11 6115 8168 56858686868255582258268 5 2 52 6052 00 0 0 000
0 0
1 9 75 1
5
6 8 385 6 8 6
56 5 6
8 8
2 6 8 5
6 35 22 2 20 0 0
5 20 2 0 0 0 0 0 0
1111 1 1 6 17 6588835 88 18
7 2
8 8 83 885 5868 53558
5 32025582 522 5
3 2 03 35 0
0 0
111
11 11 7 3 8 52
3 3 2 25 02 0 00 05 00
1 111 2 85 82838 2538352 88 5865382833 33 38 56 55 280 9
33 8 2 8 5 0 0 005 0 00 0
0
0
1 11 6 1 3 3 8 6 2 2 0 0 0
000 0
1 111111111112 1 828 888 82 8168 8 33 55 8 3 32 00
1 2 8 8 8 8 8 3338 83 333 03 538 55 3
-0.005 111111111111 83 3 3 8 33 3 5383 5 552 3 3 2
1 1 1 2 8 3 33
111 1111 88 3
2
23 333 3 2 5
5 5 0 0
1 1 111 2 3 3 2
3 3 5 0
1 8 3 5
111 1 2 33 3
1 2 33 3 23
5
1 2 22 32 3
2
2 2
-0.01 2
3
-0.015
Z= -0.01 -0.005 0 0.005 0.01 0.015

Drawbacks of PCA
its a linear projection method
Implicit assumption : data are issued from gaussian distribution
PCA only relies on order 2 statistics of the data (variance)
Beyond PCA
Non-linear PCA
ISOMAP, LLE, MVU, SNE, t-SNE . . .
Neural networks
auto-encoders
embeddings for custom data: word2vec, doc2vec (text), signal2vec
(time series)

SNE (Stochastic Neighbor Embedding) and t-SNE

The idea
Transform pairwise distances dist(x i , x j ) into probability IPX (x i |x j )
(that x i and x j are close)
low distance → high probability to be close
Assume the same for the projections dist(z i , z j ) → IPZ (z i |z j )
Find the projections {z i } : minimize distance between distributions
pX and pZ

SNE: the maths
for {x i }ni=1 for {y i }ni=1 (the unknowns)

define the probability that x j Prob. that z j is neighbor of z i
and x i are close 2
exp−δij
exp −dij2 IPZ (z i |z j ) = PN 2
IPX (x i |x j ) = PN k=1,k6=i exp−δik
exp −dik2
k=1,k6=i
with δij = kz i − z j k
dist(x i , x j )2
with dij =
2σi2
Computing the {z i }ni=1
Parameter σi defines the number of
neighbors of sample x i . It is selected Minimiser la divergence de
such that Kullback-Leibler entre IPX et IPZ
PN IP (x |x )
N minz 1 ,...,z N i,j=1 IPX (x i |x j ) log IPXZ (z ii |z jj )
X
log(K )=− IPX (x i |x j ) log IPX (x i |x j ) Solved via numerical method
j=1

Illustration of SNE
X = Z=

t-SNE variant
SNE t-SNE
Prob. that z j is neighbor of z i Prob. that z j is neighbor of z i
−δij2
exp
IPZ (z i |z j ) = PN 2
k=1,k6=i exp−δik 1 + δij2
−1
IPZ (z i |z j ) = PN −1
2
1 + δik
k=1,k6=i
with δij = kz i − z j k
2
exp-x
1/(1+x2)
Probabilités
IPx = IPZ large ⇒ δZ < dX

(attraction)
IPx = IPZ low ⇒ δZ > dX
(repulsion)
0 1 2 3 4 5 6 7 8 9 10
Distances

Illustration of t-SNE
80
60 7 7 77
7777
27 7 7777 7777 7 9949
7777777777 777 7 7 77
7 7777 77 7 7777 7 77 7 444444
78 99999 9 77777
7 9 444
3 9 9999 999999 99
9 9 9999999 4 44
40 44 9 9
99999 9 99 777 4 444444 44444
7
4 9
99 1 7 7 777 9 44 44 444
4 44
11
111111111 11 87 2
2 77779 9
9 9 77 77 47 7 4444444 4
1 111111111 999 9 7 79 449444 444 44
2 111
1111 11111 7
1 59 9999 4 44 4
4 4 99
1 111111 7 1 2 2 4 977 9 9 9 2 77 44
2 11111111 111 1 9 9 4
20 8 66 1 11111111 14
3 722 44 77 99 99 9 999 44444
1 11117 233 4 93 99
4 8 9944 9 4 4444 4
22 22 2 11111 1 1 111 7 2
2 2 7 49 599494444 44
2 7 49
2222 222
2 22 1 11 111111112 6 2222 2 222 5
959
8 8 111115 2666 2 77
04
6
5 6 85 8
0 8 8 88 8 888 33 1
2 5 5
55555
8 88 88 88 8 8888 555 55
88 8 88 8
88 8 8 93 13 5555 555 55 6663 2 6
888888888 88 88 8 888 5588 99
5 5555 555 6 666660 6666
88888888888888 88 88 8 883 88888 85 55555 6466666
6
8 2 8 8 8 55 335 5 666666666
3333533 855 5 3
5 86566 66 6 66 222 2
-20 3 8 5
3
3 3 3 55 55 5 6
6 6666 66 496 6666666666666666
6 2
2 8 3338 3 85 8 5 55 5 5 5 666 6 6666 66 2 2 22
3
3 3 3359 5 5 5 5 55 6 66666 2222222222
222
3333 53 333 55 55 555555 66 66 66
66 2 2222 22 23
22
333333 3 3 333 5 3 55 5 5 3 0 2
5 2 2
3 333 3 3 3 3
3 22 22 2
22 2
-40 333333 33 55 000
3 33
33 333333333 55 0 0 00000000 0
0
3 0 0 0 0 0 0 2
53 6 0000 00000 00 0 000
6
00 9 0 000 000
00000 00000000
0
0 00 0 0 0000000
-60 000000000000000000
0
-80
-60 -40 -20 0 20 40 60 80
https://lvdmaaten.github.io/tsne/

Conclusions
PCA a linear projection method for dimensionality reduction

Several non-linear projection approaches (t-SNE, UMAP, Isomap . . . )
Involves advanced optimization issues
All these methods are useful for data visualization

Which one to pick up in pratice ? not so easy to say
Some toolboxes
Matlab : https://lvdmaaten.github.io/drtoolbox/
Python : http:
//scikit-learn.org/stable/modules/manifold.html#manifold
Graphical tool : http://divvy.ucsd.edu/

Dimensionality Reduction and Data Visualization PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dimensionality Reduction and Data Visualization PDF

Uploaded by

Copyright:

Available Formats

Dimensionality reduction and data visualization

INSA Rouen - ASI Departement

September 12, 2019

Gilles Gasso Dimensionality reduction and data visualization 1 / 27

Supervised learning (predictive methods)

Unsupervised learning (descriptive methods) K=6

Gilles Gasso Dimensionality reduction and data visualization 2 / 27

rcc wcc hc hg ferr bmi ssf pcBfat lbm ht wt sex

Dimension reduction: the goal

Data representation (q < d)

How to assess the quality of the coding?

Gilles Gasso Dimensionality reduction and data visualization 5 / 27

Principle of dimension reduction methods

Methods we will study

Principal Component Analysis (PCA)

Property: columns of P are orthogonal

Objective: minimize reconstruction error between the x i and x̂ i = dec(cod(x i ))

Gilles Gasso Dimensionality reduction and data visualization 7 / 27

Another view of PCA

PCA linearly projects {x i ∈ Rd }N

Variance maximization (case q = 1)

Minimization of error / maximization of variance

⇒ min J(P) ⇔ maximizing the variance of the projections w.r.t. P

Learning the first projection vector p 1 of P

Assume the x i are

−→ This turns into a constrained optimization problem

1 (λ1 , p 1 ) is the couple (eigenvalue , eigenvector) of the C

p 1 is the eigenvector associated to the highest eigenvalue of C .

Computing p 2 and beyond

we are looking for a unit vector p 2 orthogonal to p 1 that maximizes

3 Find the eigenvalue decomposition {p j ∈ Rd , λj ∈ R}dj=1 of C

{p 1 , · · · , p q } are the d eigenvectors associated to the q highest eigenvalues.

Gilles Gasso Dimensionality reduction and data visualization 13 / 27

rcc wcc hc hg ferr bmi ssf pcBfat lbm ht wt sex

Gilles Gasso Dimensionality reduction and data visualization 14 / 27

wcc 0.15 0.75

wt 0.4 0.16 0.42 0.46 0.27 0.85 0.15 0 0.93 0.78 1

Knowing P ∈ Rd×q , the mean x̄ and std σ

−→ we achieve dimensionality reduction

Choice of q, the projection space size

Visualizing Mnist dataset

Gilles Gasso Dimensionality reduction and data visualization 20 / 27

its a linear projection method

Implicit assumption : data are issued from gaussian distribution

PCA only relies on order 2 statistics of the data (variance)

Gilles Gasso Dimensionality reduction and data visualization 21 / 27

SNE (Stochastic Neighbor Embedding) and t-SNE

Gilles Gasso Dimensionality reduction and data visualization 22 / 27

SNE: the maths

for {x i }ni=1 for {y i }ni=1 (the unknowns)

Gilles Gasso Dimensionality reduction and data visualization 23 / 27

Gilles Gasso Dimensionality reduction and data visualization 24 / 27

IPx = IPZ large ⇒ δZ < dX

Gilles Gasso Dimensionality reduction and data visualization 25 / 27

Gilles Gasso Dimensionality reduction and data visualization 26 / 27

PCA a linear projection method for dimensionality reduction

Involves advanced optimization issues

All these methods are useful for data visualization

Gilles Gasso Dimensionality reduction and data visualization 27 / 27

You might also like