DAAI - Lecture - 04 - With - Solutions - 10oct22

DA AI
Data Analysis and

Artificial Intelligence
A.A. 2022/2023
Tatiana Tommasi
1
Exercise 1) Probability
P(good movie | Tom Cruise) = 0.01
P(good movie | not Tom Cruise) = 0.1
P(Tom Cruise in a movie) = 0.01
What is P(Tom Cruise | not good movie) ?
2022/23
2
Exercise 3) Probability
P(good movie | Tom Cruise) = 0.01
P(good movie | not Tom Cruise) = 0.1
P(Tom Cruise in a movie) = 0.01
What is P(Tom Cruise | not good movie) ?
2022/23
3
Exercise 2) Naive Bayes
variables (a,b,c), label K
According to the naive bayes classifier, what is

P(K=1 | a=1 , b=1 , c=0) ?
Hint: be careful, I’m not asking you to classify the sample

(a=1 , b=1 , c=0).
2022/23
4

P(K=1 | a=1 , b=1 , c=0) ?
Hint: be careful, I’m not asking you to classify the sample

(a=1 , b=1 , c=0).
2022/23
5

P(K=1 | a=1 , b=1 , c=0) ?
P(a=1 | K=1) = 1/2

P(b=1 | K=1) = 1/4
P(c=0 | K=1) = 1/2
P(K=1) = 1/2
P(a=1 , b=1 , c=0) = ⅛
{P(K=1) * P(a=1 | K=1) * P(b=1 | K=1) * P(c=0 | K=1)} / P(a=1 , b=1 , c=0)
{½ * (½ * ¼ * ½)} / ⅛ = ¼
2022/23
6
Exercise 3) Maximum Likelihood Estimator
2022/23
7
2022/23
8
2022/23
9
Exercise 4) Bias, Variance and MSE
2022/23
10
2022/23
11
2022/23
12
2022/23
13
2022/23
14
2022/23
15
Principal Component Analysis
A very first small step towards unsupervised learning….
2022/23
Slide Credit: Barnabás Póczos & Alex Smola 16
● Till now we have often considered cases where our samples are described by vector of
numbers x∈Rd
● How can we visualize them?
Example: suppose you want to collect data from 53 blood and urine samples (features) from 65
people.
Very difficult to see

the correlation between features
2022/23
Data Visualization
2022/23
Data Visualization
2022/23
Data Visualization
2022/23
Data Visualization
2022/23
Orthogonal projection of the data onto a lower dimension linear space that...
● maximizes variance of projected data (purple line)
● minimizes mean squared distance between data point and projections (sum of blue lines)
2022/23
Orthogonal projection of the data onto a lower dimension linear space that...
● maximizes variance of projected data (purple line)
● minimizes mean squared distance between data point and projections (sum of blue lines)
2022/23
2022/23
2022/23
2d Gaussian Dataset
2022/23
1st PCA axis
2022/23
2nd PCA axis
2022/23
PCA algorithm I (sequential)
x
w
xTw = wTx
2022/23
this is a scalar
2022/23
2022/23
PCA algorithm II (sample covariance matrix)
Hint: sample covariance matrix for data dimension p
2022/23
2022/23
dxN
xi = column vector, i = 1…N
2022/23
How to solve the characteristic equation?
2022/23
35
PCA algorithm III (SVD of data matrix)
dxN
dxN NxN
(sig. K x K) (sig. K x N)
dxd
2022/23 dxN (sig. d x K)
PCA algorithm III (SVD of data matrix)
eigenvectors dxN
of XXT
singular
values
eigenvectors
of XTX
2022/23
Implementation Issue and Solution
● Covariance matrix is huge ( d x d for

dxN a vectors of dimension d)
● But typically the number of examples is N<<d
Simple trick:
● X is a dxN matrix of centered training data
● Solve for the eigenvectors v of A = XTX instead of Σ= XXT
● if v is the eigenvector of A, then Xv is the eigenvector of Σ
Av = λv
XTXv = λv
XXTXv = λXv
Σ(Xv)= λ(Xv)
2022/23
How to choose the top K?
● One goal can be to pick K such that a certain percentage of the variance of the data is
preserved, e.g. 90% dxN
● The total variance of the data equals the sum of squares of S’s elements (singular
values)
● Take as many of these entries as needed

K = number of elements s.t. (cumsum S.^2 / total variance) > P%
Behaviour of the variance: few

eigenvectors cover very large
variance and a lot of
eigenvectors cover a very small
variance
(Figure 12.4 (a) from Bishop)
2022/23
39
Application: Face Recognition
Image from cnet.com

2022/23
40
The space of all face images
Slide Credit: Derek Hoiem

2022/23 and Adriana Kovashka
41
“Eigen Faces”
dxN
2022/23
Slide Credit: Alexander Ihler, Adriana Kovashka 42
“Eigen Faces”
S V
X U KxK KxN
dxN dxK
2022/23
Slide Credit: Alexander Ihler, Adriana Kovashka 43
“Eigen Faces”
2022/23
Slide Credit: Derek Hoiem, Adriana Kovashka 44
Representation and Reconstruction
2022/23
Slide Credit: Derek Hoiem, Adriana Kovashka 45
Visualize Subspaces
Happiness
Subspace
Disgust
Subspace
2022/23
Shortcomings
Requires carefully controlled data:
● All faces centered in frame

● Same size
● Some sensitivity to angle
● Method is completely knowledge free (sometimes this is good!)
● Doesn’t know that faces are wrapped around 3D objects (heads)
● Makes no effort to preserve class distinctions
2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 47
Image Compression
● Divide the original 372x492 image

into patches
● Each patch is an instance that

contains 12x12 pixels on a grid
● View each as a 144 - D vector
2022/23
PCA Compression
from 144D to 60D from 144D to 16D from 144D to 6D from 144D to 1D
2022/23
PCA Compression
from 144D to 60D from 144D to 16D from 144D to 6D from 144D to 1D
2022/23
Denoising
2022/23
Using 15 PCA components
2022/23
Problematic Data Set for PCA
2022/23
PCA conclusions
2022/23
54
Let’s Exercise!
2022/23
55
Let’s Exercise!
2022/23
56
Let’s Exercise!
2022/23
57
Let’s Exercise!
2022/23
58
Let’s Exercise!
2022/23
59
Let’s Exercise!
2022/23
60
Let’s Exercise!
2022/23
61
Let’s Exercise!
what if I want ||u1||=1 ? t=1 / ||u1||
2022/23
62
Let’s Exercise!
2022/23
63
Your turn...
2022/23
64
Your turn...
normalized
2022/23
65
Your turn...
2022/23
66
Your turn...
Questions:
- are X1 and X2 orthogonal?
- Why?
2022/23
67
Your turn again...
2022/23
68
Your turn again...
2022/23
69
Your turn again...
2022/23
normalized
70
Your turn again...
2022/23 normalized
71
Your turn again...
Questions: are they

orthogonal?
2022/23
72
Your turn again...
Find the principal components of these data: n=10 samples of d=2 dimensions
Hint: sample covariance matrix
2022/23
73
Your turn again...
Find the principal components of these data: 10 samples of d=2 dimensions
=
2022/23
74
Your turn again...
...Find the eigenvector and eigenvalues of the covariance matrix
=
Av = λv
sum of these two
det(A-λI)=0
terms on the
diagonal = 16,375
(varx1-λ)*(varx2-λ) - (covx1x2)2 = 0
λ2 - (varx1+varx2)*λ + (varx1*varx2)- (covx1x2)2 = 0
λ2 - 16.376*λ + 0.122= 0
λ1 = 16,368 λ2 = 0,00746 note that (λ1+λ2)= 16,375
2022/23
75
Your turn again...
- λI 0 =0 We consider the first eigenvalue

0 λI
- a 0
= and a2 + b2 = 1
- b 0
and a=0,6262 b= 0,7797
Similarly for the second eigenvalue we get a=0,7797 and b=-0,6262
They are orthogonal...
2022/23
76
Your turn again...
Obtain the coordinates of data points in the direction of the eigenvectors

➢ Multiply the centered data matrix to the eigenvector matrix
eigvec1 eigvec2
x 0,6262 0,7797 =
0,7797 -0,6262
2022/23
77
Your turn again...
Obtain the coordinates of data points in the direction of the eigenvectors

➢ Multiply the centered data matrix to the eigenvector matrix
The variance of projections along the line of the principal

components is equal to the eigenvalue of the same component.
Here the variance of the first component is able to explain around

99% of the variance = 16,368 /(16,368+0,00746)
2022/23
78
Bird Example
2022/23
79
Bird Example
2022/23
80
Bird Example
Keeping in mind that we are studying the sizes (length, wingspan, weight) of North America birds: how do we
interpret all this?
● there are apparently only two factors that are important (corresponding to u1 and u2)
● we might think of u1 as giving a generalized notion of “size” that incorporates length, wingspan, and weight
● indeed, all three entries of u1 have the same sign, indicating that birds with larger “size” tend to have larger
length, wingspan, and weight.
2022/23
81
Bird Example
Which of the factors (length, wingspan, weight) is most significant in determining a bird’s “size”?
In other words, does the first principal component u1 point the most in the direction of the length axis, the wingspan
axis, or the weight axis?
● The third entry, weight, of u1 is the largest, so weight is the most significant.
This means a change in one unit of weight tends to affect the size more so than a change in one unit of
length or wingspan.
● The second entry of u1 is the next largest, which corresponds to wingspan.

Thus, wingspan is the next most important factor in determining a bird’s size (followed lastly by length).
2022/23
82
Bird Example
What does the second principal component mean?
● It is mostly influenced by wingspan and weight, as these entries in u2 have the greatest absolute values.
● However, they also have opposite signs. This indicates that u2 describes a feature of birds corresponding to
relatively small wingspan and large weight, or vice versa. We might call this quality “stoutness.”
2022/23
83
Bird Example
2022/23
84

DAAI - Lecture - 04 - With - Solutions - 10oct22

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DAAI - Lecture - 04 - With - Solutions - 10oct22

Uploaded by

Copyright:

Available Formats

DA AI

Data Analysis and

What is P(Tom Cruise | not good movie) ?

What is P(Tom Cruise | not good movie) ?

According to the naive bayes classifier, what is

Hint: be careful, I’m not asking you to classify the sample

According to the naive bayes classifier, what is

Hint: be careful, I’m not asking you to classify the sample

According to the naive bayes classifier, what is

P(a=1 | K=1) = 1/2

P(a=1 , b=1 , c=0) = ⅛

A very first small step towards unsupervised learning….

● How can we visualize them?

Very difficult to see

Hint: sample covariance matrix for data dimension p

● Covariance matrix is huge ( d x d for

● Take as many of these entries as needed

Behaviour of the variance: few

Image from cnet.com

Slide Credit: Derek Hoiem

Requires carefully controlled data:

● All faces centered in frame

● Divide the original 372x492 image

● Each patch is an instance that

● View each as a 144 - D vector

what if I want ||u1||=1 ? t=1 / ||u1||

Questions: are they

Hint: sample covariance matrix

Find the principal components of these data: 10 samples of d=2 dimensions

...Find the eigenvector and eigenvalues of the covariance matrix

λ2 - (varx1+varx2)*λ + (varx1*varx2)- (covx1x2)2 = 0

λ1 = 16,368 λ2 = 0,00746 note that (λ1+λ2)= 16,375

- λI 0 =0 We consider the first eigenvalue

and a=0,6262 b= 0,7797

Similarly for the second eigenvalue we get a=0,7797 and b=-0,6262

They are orthogonal...

Obtain the coordinates of data points in the direction of the eigenvectors

Obtain the coordinates of data points in the direction of the eigenvectors

The variance of projections along the line of the principal

Here the variance of the first component is able to explain around

● The second entry of u1 is the next largest, which corresponds to wingspan.

What does the second principal component mean?

You might also like

λ2 - (varx1+varx2)λ + (varx1varx2)- (covx1x2)2 = 0