You are on page 1of 84

DA AI

Data Analysis and


Artificial Intelligence

A.A. 2022/2023
Tatiana Tommasi
1
Exercise 1) Probability
P(good movie | Tom Cruise) = 0.01
P(good movie | not Tom Cruise) = 0.1
P(Tom Cruise in a movie) = 0.01

What is P(Tom Cruise | not good movie) ?

2022/23
2
Exercise 3) Probability
P(good movie | Tom Cruise) = 0.01
P(good movie | not Tom Cruise) = 0.1
P(Tom Cruise in a movie) = 0.01

What is P(Tom Cruise | not good movie) ?

2022/23
3
Exercise 2) Naive Bayes
variables (a,b,c), label K

According to the naive bayes classifier, what is


P(K=1 | a=1 , b=1 , c=0) ?

Hint: be careful, I’m not asking you to classify the sample


(a=1 , b=1 , c=0).

2022/23
4
Exercise 2) Naive Bayes
variables (a,b,c), label K

According to the naive bayes classifier, what is


P(K=1 | a=1 , b=1 , c=0) ?

Hint: be careful, I’m not asking you to classify the sample


(a=1 , b=1 , c=0).

2022/23
5
Exercise 2) Naive Bayes
variables (a,b,c), label K

According to the naive bayes classifier, what is


P(K=1 | a=1 , b=1 , c=0) ?

P(a=1 | K=1) = 1/2


P(b=1 | K=1) = 1/4
P(c=0 | K=1) = 1/2
P(K=1) = 1/2

P(a=1 , b=1 , c=0) = ⅛

{P(K=1) * P(a=1 | K=1) * P(b=1 | K=1) * P(c=0 | K=1)} / P(a=1 , b=1 , c=0)

{½ * (½ * ¼ * ½)} / ⅛ = ¼
2022/23
6
Exercise 3) Maximum Likelihood Estimator

2022/23
7
Exercise 3) Maximum Likelihood Estimator

2022/23
8
Exercise 3) Maximum Likelihood Estimator

2022/23
9
Exercise 4) Bias, Variance and MSE

2022/23
10
Exercise 2) Bias, Variance and MSE

2022/23
11
Exercise 2) Bias, Variance and MSE

2022/23
12
Exercise 2) Bias, Variance and MSE

2022/23
13
Exercise 2) Bias, Variance and MSE

2022/23
14
Exercise 2) Bias, Variance and MSE

2022/23
15
Principal Component Analysis

A very first small step towards unsupervised learning….

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 16
Principal Component Analysis
● Till now we have often considered cases where our samples are described by vector of
numbers x∈Rd

● How can we visualize them?

Example: suppose you want to collect data from 53 blood and urine samples (features) from 65
people.

Very difficult to see


the correlation between features

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 17
Data Visualization

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 18
Data Visualization

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 19
Data Visualization

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 20
Data Visualization

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 21
Principal Component Analysis

Orthogonal projection of the data onto a lower dimension linear space that...
● maximizes variance of projected data (purple line)
● minimizes mean squared distance between data point and projections (sum of blue lines)

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 22
Principal Component Analysis

Orthogonal projection of the data onto a lower dimension linear space that...
● maximizes variance of projected data (purple line)
● minimizes mean squared distance between data point and projections (sum of blue lines)

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 23
Principal Component Analysis

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 24
Principal Component Analysis

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 25
2d Gaussian Dataset

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 26
1st PCA axis

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 27
2nd PCA axis

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 28
PCA algorithm I (sequential)

x
w

xTw = wTx

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 29
PCA algorithm I (sequential)

this is a scalar

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 30
PCA algorithm I (sequential)

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 31
PCA algorithm II (sample covariance matrix)

Hint: sample covariance matrix for data dimension p

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 32
PCA algorithm II (sample covariance matrix)

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 33
PCA algorithm II (sample covariance matrix)

dxN
xi = column vector, i = 1…N

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 34
How to solve the characteristic equation?

2022/23
35
PCA algorithm III (SVD of data matrix)

dxN

dxN NxN
(sig. K x K) (sig. K x N)

dxd
2022/23 dxN (sig. d x K)
Slide Credit: Barnabás Póczos & Alex Smola 36
PCA algorithm III (SVD of data matrix)

eigenvectors dxN
of XXT

singular
values

eigenvectors
of XTX

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 37
Implementation Issue and Solution

● Covariance matrix is huge ( d x d for


dxN a vectors of dimension d)
● But typically the number of examples is N<<d

Simple trick:
● X is a dxN matrix of centered training data
● Solve for the eigenvectors v of A = XTX instead of Σ= XXT
● if v is the eigenvector of A, then Xv is the eigenvector of Σ

Av = λv
XTXv = λv
XXTXv = λXv
Σ(Xv)= λ(Xv)

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 38
How to choose the top K?

● One goal can be to pick K such that a certain percentage of the variance of the data is
preserved, e.g. 90% dxN

● The total variance of the data equals the sum of squares of S’s elements (singular
values)

● Take as many of these entries as needed


K = number of elements s.t. (cumsum S.^2 / total variance) > P%

Behaviour of the variance: few


eigenvectors cover very large
variance and a lot of
eigenvectors cover a very small
variance
(Figure 12.4 (a) from Bishop)

2022/23
39
Application: Face Recognition

Image from cnet.com


2022/23
40
The space of all face images

Slide Credit: Derek Hoiem


2022/23 and Adriana Kovashka
41
“Eigen Faces”

dxN

2022/23
Slide Credit: Alexander Ihler, Adriana Kovashka 42
“Eigen Faces”

S V
X U KxK KxN
dxN dxK

2022/23
Slide Credit: Alexander Ihler, Adriana Kovashka 43
“Eigen Faces”

2022/23
Slide Credit: Derek Hoiem, Adriana Kovashka 44
Representation and Reconstruction

2022/23
Slide Credit: Derek Hoiem, Adriana Kovashka 45
Visualize Subspaces

Happiness
Subspace

Disgust
Subspace

2022/23
Slide Credit: Barnabás Póczos & Alex Smola 46
Shortcomings

Requires carefully controlled data:

● All faces centered in frame


● Same size
● Some sensitivity to angle
● Method is completely knowledge free (sometimes this is good!)
● Doesn’t know that faces are wrapped around 3D objects (heads)
● Makes no effort to preserve class distinctions

2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 47
Image Compression

● Divide the original 372x492 image


into patches

● Each patch is an instance that


contains 12x12 pixels on a grid

● View each as a 144 - D vector

2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 48
PCA Compression

from 144D to 60D from 144D to 16D from 144D to 6D from 144D to 1D

2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 49
PCA Compression

from 144D to 60D from 144D to 16D from 144D to 6D from 144D to 1D

2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 50
Denoising

2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 51
Using 15 PCA components

2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 52
Problematic Data Set for PCA

2022/23
Slide Credit: Aarti Singh & Barnabás Póczos 53
PCA conclusions

2022/23
54
Let’s Exercise!

2022/23
55
Let’s Exercise!

2022/23
56
Let’s Exercise!

2022/23
57
Let’s Exercise!

2022/23
58
Let’s Exercise!

2022/23
59
Let’s Exercise!

2022/23
60
Let’s Exercise!

2022/23
61
Let’s Exercise!

what if I want ||u1||=1 ? t=1 / ||u1||

2022/23
62
Let’s Exercise!

2022/23
63
Your turn...

2022/23
64
Your turn...

normalized
2022/23
65
Your turn...

2022/23
66
Your turn...

Questions:
- are X1 and X2 orthogonal?
- Why?
2022/23
67
Your turn again...

2022/23
68
Your turn again...

2022/23
69
Your turn again...

2022/23
normalized
70
Your turn again...

2022/23 normalized
71
Your turn again...

Questions: are they


orthogonal?

2022/23
72
Your turn again...

Find the principal components of these data: n=10 samples of d=2 dimensions

Hint: sample covariance matrix

2022/23
73
Your turn again...

Find the principal components of these data: 10 samples of d=2 dimensions

=
2022/23
74
Your turn again...

...Find the eigenvector and eigenvalues of the covariance matrix

=
Av = λv
sum of these two
det(A-λI)=0
terms on the
diagonal = 16,375
(varx1-λ)*(varx2-λ) - (covx1x2)2 = 0

λ2 - (varx1+varx2)*λ + (varx1*varx2)- (covx1x2)2 = 0

λ2 - 16.376*λ + 0.122= 0

λ1 = 16,368 λ2 = 0,00746 note that (λ1+λ2)= 16,375

2022/23
75
Your turn again...

- λI 0 =0 We consider the first eigenvalue


0 λI

- a 0
= and a2 + b2 = 1
- b 0

and a=0,6262 b= 0,7797

Similarly for the second eigenvalue we get a=0,7797 and b=-0,6262

They are orthogonal...

2022/23
76
Your turn again...

Obtain the coordinates of data points in the direction of the eigenvectors


➢ Multiply the centered data matrix to the eigenvector matrix

eigvec1 eigvec2

x 0,6262 0,7797 =
0,7797 -0,6262

2022/23
77
Your turn again...

Obtain the coordinates of data points in the direction of the eigenvectors


➢ Multiply the centered data matrix to the eigenvector matrix

The variance of projections along the line of the principal


components is equal to the eigenvalue of the same component.

Here the variance of the first component is able to explain around


99% of the variance = 16,368 /(16,368+0,00746)

2022/23
78
Bird Example

2022/23
79
Bird Example

2022/23
80
Bird Example

Keeping in mind that we are studying the sizes (length, wingspan, weight) of North America birds: how do we
interpret all this?

● there are apparently only two factors that are important (corresponding to u1 and u2)

● we might think of u1 as giving a generalized notion of “size” that incorporates length, wingspan, and weight

● indeed, all three entries of u1 have the same sign, indicating that birds with larger “size” tend to have larger
length, wingspan, and weight.

2022/23
81
Bird Example

Which of the factors (length, wingspan, weight) is most significant in determining a bird’s “size”?
In other words, does the first principal component u1 point the most in the direction of the length axis, the wingspan
axis, or the weight axis?

● The third entry, weight, of u1 is the largest, so weight is the most significant.
This means a change in one unit of weight tends to affect the size more so than a change in one unit of
length or wingspan.

● The second entry of u1 is the next largest, which corresponds to wingspan.


Thus, wingspan is the next most important factor in determining a bird’s size (followed lastly by length).

2022/23
82
Bird Example

What does the second principal component mean?

● It is mostly influenced by wingspan and weight, as these entries in u2 have the greatest absolute values.

● However, they also have opposite signs. This indicates that u2 describes a feature of birds corresponding to
relatively small wingspan and large weight, or vice versa. We might call this quality “stoutness.”

2022/23
83
Bird Example

2022/23
84

You might also like