Professional Documents
Culture Documents
A.A. 2022/2023
Tatiana Tommasi
1
Exercise 1) Probability
P(good movie | Tom Cruise) = 0.01
P(good movie | not Tom Cruise) = 0.1
P(Tom Cruise in a movie) = 0.01
2022/23
2
Exercise 3) Probability
P(good movie | Tom Cruise) = 0.01
P(good movie | not Tom Cruise) = 0.1
P(Tom Cruise in a movie) = 0.01
2022/23
3
Exercise 2) Naive Bayes
variables (a,b,c), label K
2022/23
4
Exercise 2) Naive Bayes
variables (a,b,c), label K
2022/23
5
Exercise 2) Naive Bayes
variables (a,b,c), label K
{P(K=1) * P(a=1 | K=1) * P(b=1 | K=1) * P(c=0 | K=1)} / P(a=1 , b=1 , c=0)
{½ * (½ * ¼ * ½)} / ⅛ = ¼
2022/23
6
Exercise 3) Maximum Likelihood Estimator
2022/23
7
Exercise 3) Maximum Likelihood Estimator
2022/23
8
Exercise 3) Maximum Likelihood Estimator
2022/23
9
Exercise 4) Bias, Variance and MSE
2022/23
10
Exercise 2) Bias, Variance and MSE
2022/23
11
Exercise 2) Bias, Variance and MSE
2022/23
12
Exercise 2) Bias, Variance and MSE
2022/23
13
Exercise 2) Bias, Variance and MSE
2022/23
14
Exercise 2) Bias, Variance and MSE
2022/23
15
Principal Component Analysis
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 16
Principal Component Analysis
● Till now we have often considered cases where our samples are described by vector of
numbers x∈Rd
Example: suppose you want to collect data from 53 blood and urine samples (features) from 65
people.
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 17
Data Visualization
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 18
Data Visualization
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 19
Data Visualization
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 20
Data Visualization
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 21
Principal Component Analysis
Orthogonal projection of the data onto a lower dimension linear space that...
● maximizes variance of projected data (purple line)
● minimizes mean squared distance between data point and projections (sum of blue lines)
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 22
Principal Component Analysis
Orthogonal projection of the data onto a lower dimension linear space that...
● maximizes variance of projected data (purple line)
● minimizes mean squared distance between data point and projections (sum of blue lines)
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 23
Principal Component Analysis
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 24
Principal Component Analysis
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 25
2d Gaussian Dataset
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 26
1st PCA axis
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 27
2nd PCA axis
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 28
PCA algorithm I (sequential)
x
w
xTw = wTx
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 29
PCA algorithm I (sequential)
this is a scalar
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 30
PCA algorithm I (sequential)
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 31
PCA algorithm II (sample covariance matrix)
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 32
PCA algorithm II (sample covariance matrix)
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 33
PCA algorithm II (sample covariance matrix)
dxN
xi = column vector, i = 1…N
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 34
How to solve the characteristic equation?
2022/23
35
PCA algorithm III (SVD of data matrix)
dxN
dxN NxN
(sig. K x K) (sig. K x N)
dxd
2022/23 dxN (sig. d x K)
Slide Credit: Barnabás Póczos & Alex Smola 36
PCA algorithm III (SVD of data matrix)
eigenvectors dxN
of XXT
singular
values
eigenvectors
of XTX
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 37
Implementation Issue and Solution
Simple trick:
● X is a dxN matrix of centered training data
● Solve for the eigenvectors v of A = XTX instead of Σ= XXT
● if v is the eigenvector of A, then Xv is the eigenvector of Σ
Av = λv
XTXv = λv
XXTXv = λXv
Σ(Xv)= λ(Xv)
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 38
How to choose the top K?
● One goal can be to pick K such that a certain percentage of the variance of the data is
preserved, e.g. 90% dxN
● The total variance of the data equals the sum of squares of S’s elements (singular
values)
2022/23
39
Application: Face Recognition
dxN
2022/23
Slide Credit: Alexander Ihler, Adriana Kovashka 42
“Eigen Faces”
S V
X U KxK KxN
dxN dxK
2022/23
Slide Credit: Alexander Ihler, Adriana Kovashka 43
“Eigen Faces”
2022/23
Slide Credit: Derek Hoiem, Adriana Kovashka 44
Representation and Reconstruction
2022/23
Slide Credit: Derek Hoiem, Adriana Kovashka 45
Visualize Subspaces
Happiness
Subspace
Disgust
Subspace
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 46
Shortcomings
2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 47
Image Compression
2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 48
PCA Compression
from 144D to 60D from 144D to 16D from 144D to 6D from 144D to 1D
2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 49
PCA Compression
from 144D to 60D from 144D to 16D from 144D to 6D from 144D to 1D
2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 50
Denoising
2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 51
Using 15 PCA components
2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 52
Problematic Data Set for PCA
2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 53
PCA conclusions
2022/23
54
Let’s Exercise!
2022/23
55
Let’s Exercise!
2022/23
56
Let’s Exercise!
2022/23
57
Let’s Exercise!
2022/23
58
Let’s Exercise!
2022/23
59
Let’s Exercise!
2022/23
60
Let’s Exercise!
2022/23
61
Let’s Exercise!
2022/23
62
Let’s Exercise!
2022/23
63
Your turn...
2022/23
64
Your turn...
normalized
2022/23
65
Your turn...
2022/23
66
Your turn...
Questions:
- are X1 and X2 orthogonal?
- Why?
2022/23
67
Your turn again...
2022/23
68
Your turn again...
2022/23
69
Your turn again...
2022/23
normalized
70
Your turn again...
2022/23 normalized
71
Your turn again...
2022/23
72
Your turn again...
Find the principal components of these data: n=10 samples of d=2 dimensions
2022/23
73
Your turn again...
=
2022/23
74
Your turn again...
=
Av = λv
sum of these two
det(A-λI)=0
terms on the
diagonal = 16,375
(varx1-λ)*(varx2-λ) - (covx1x2)2 = 0
λ2 - 16.376*λ + 0.122= 0
2022/23
75
Your turn again...
- a 0
= and a2 + b2 = 1
- b 0
2022/23
76
Your turn again...
eigvec1 eigvec2
x 0,6262 0,7797 =
0,7797 -0,6262
2022/23
77
Your turn again...
2022/23
78
Bird Example
2022/23
79
Bird Example
2022/23
80
Bird Example
Keeping in mind that we are studying the sizes (length, wingspan, weight) of North America birds: how do we
interpret all this?
● there are apparently only two factors that are important (corresponding to u1 and u2)
● we might think of u1 as giving a generalized notion of “size” that incorporates length, wingspan, and weight
● indeed, all three entries of u1 have the same sign, indicating that birds with larger “size” tend to have larger
length, wingspan, and weight.
2022/23
81
Bird Example
Which of the factors (length, wingspan, weight) is most significant in determining a bird’s “size”?
In other words, does the first principal component u1 point the most in the direction of the length axis, the wingspan
axis, or the weight axis?
● The third entry, weight, of u1 is the largest, so weight is the most significant.
This means a change in one unit of weight tends to affect the size more so than a change in one unit of
length or wingspan.
2022/23
82
Bird Example
● It is mostly influenced by wingspan and weight, as these entries in u2 have the greatest absolute values.
● However, they also have opposite signs. This indicates that u2 describes a feature of birds corresponding to
relatively small wingspan and large weight, or vice versa. We might call this quality “stoutness.”
2022/23
83
Bird Example
2022/23
84