Professional Documents
Culture Documents
Georgios Arvanitidis
Nearest Neighbor
3 October: C10, C12 (Project 1 due before 13:00)
7 Performance evaluation, Bayes, and
Naive Bayes
10 October: C11, C13
Pre-processing
row
datpoint
each ,
coints
[Jazzos
in
~ ..
...
==
-
:
.
I
avodinacions
/
↑ - -
~ W
s
-
L W
1000
6 DTU Compute Lecture 3 12 September, 2023
osvg-9
x
d
Vb
Principal component analysis (PCA)
V2 -
......, -
4
.
+
...... x
=
..... -
⑧
V(x H) I
U
: ·
VI
b
-
=
............
1
·
&
" >
......
'
5 =
(x 1) V(x)
-
= xV(x)el12"
1x K IxM MxK
x =
V,. b +
+
kxt
kxt
-
Mxk
viafrit 4) , ...,
Mxk
z =
/w
7 DTU Compute Lecture 3 12 September, 2023
osvg-10
~
⑳
I
↑ ·
randomly v v
reconstruct
selectedt we col
poin the subspace
it on
VI
1
.
-
/
↑
.
.
.
"
Vector (1-dim
subspace)
and project
the data there
we cannot
reconstruct
them perfectly .
Dissimilarity measures
1q 2 p1
• General Minkowsky distance (p-distance) dp (x, y) = M
j=1 |xj ≠ yj |p
• One-norm (p = 1) with y
=
[8]
color(x) = red if
M
ÿ di(x y)[1
d1 (x, y) =
,
|xj ≠ yj | ele
color(x)
=
blue
j=1
ı̂ M dz(x y) ,
= 1
ıÿ and
d2 (x, y) = Ù (xj ≠ yj )2 doo(X y) ,
=1
j=1
• Max-norm distance (p = Œ)
Similarity measures
fitfoot fot fol
Simple Matching Coefficient (SMC) fro : xx =1 , xx
=
0 k =
1
for :
X
=
0
, xk
=
f00 + f11
SMC(x, y) =
K
Jaccard Coefficient
f11
J(x, y) =
K ≠ f00
Cosine similarity
x€ y
cos(x, y) =
ÎxÎÎyÎ
Extended Jaccard coefficient
x€ y
EJ(x, y) =
ÎxÎ2 + ÎyÎ2 ≠ x€ y
B. SMC(o1 , o2 ) = 3
J(o1 , o2 ) = 34 , cos(o1 , o2 ) = 2
3,
foot
5
= +z
=
X
2 1 2
fr =
~
2
S(x y) , 1
11 + 11
0 0 01
C. SMC(o1 , o2 ) = 5 J(o1 , o2 ) = 3 , cos(o1 , o2 ) = 3 ,
q
Yzxy
. +
+
k 3 1 0 +
9 2 1 2
-D. SMC(o1 , o2 ) = 5 J(o1 , o2 ) = 3 , cos(o1 , o2 ) = 3,
= .
=
cos(x y)
V-
=
X1 1
,
11x11z
= =
+ 1 +
E. Don’t know.
Invariance
Scale invariance
d(x, y) = d(–x, y)
Translation invariance
d(x, y) = d(– + x, y)
General invariances
I I ↓
dz(X+ x)) 1 000
Transformations
=
30
.
40 40 ,
50 .
000 50 000 ,
49 .
000
does this represent
a
?
12
reasonable" distance
Standardization: Ensure a single attribute will not dominate:
xik ≠ µ̂k
x̃ik =
ˆk
‡
1
s(x, y) = (sEdu. + sAge. )
2
-
• Empirical variance
1 ÿ
N
ŝ = var[x]
ˆ = (xi ≠ µ̂)2
N ≠ 1 i=1
• Empirical covariance
1 ÿ
N
cov[x,
ˆ y] = (xi ≠ µ̂x )(yi ≠ µ̂y )
N ≠ 1 i=1
Correlation
cov[x,
ˆ y]
cor[x,
ˆ y] =
ˆx ‡
‡ ˆy
xaz xizi'a
Given N observations of an attribute x1 , x2 , . . . , xN œ R.
Quantiles describe the points that divide the underlying distribution into
intervals that are equally probable:
• The one 2-quantile (median) divides the distribution in two intervals.
• The three 4-quantiles (quartiles) divides the distribution in four intervals.
• The 99 100-quantiles (percentiles) divides the distribution in 100 intervals.
The median is the same as the 2nd quartile or the 50th percentile.
Probabilities
Binary propositions
A binary proposition is a statement which is either true or false (we might
not know, but someone with complete knowledge would)
A : In 49 BCE, Caesar crossed the Rubicon
B : Acceleration sensor 39 measures more than 0.85
C : Patient 901 has high cholesterol
Propositions can be combined with and, or and not:
AB © True if A and B are both true Ar
A + B © True if either A or B are true Au
-
A © True if A is false
We define two special propositions which is always true/false:
-
Quiz 2, Probabilities
Assume we define the following 4 boolean variables. A. P (R1 R2 R3 |F ) > 0.9
p(F)R1R2Rs)
ment: > 0 .
9
If a student hand in report 1, 2 and 3, the
chance of passing 02450 is greater than 90%?
P (A|1) = P (A)
②
=
P(AIBC) P(BI)
·
① ② ②
Ë n e eÈ a
m
P(BIAC P(AIC8
P(ABIC)
P(BIAC)P(AIC)]
P(ABC) P(BIC)
P(ABC) P(BIC)
= ·
• If the DNA are from different persons, DNA will incorrectly give a positive match
one time out of a million 1 e6
↑(D1E)
-
=
p(G1D) =
P(D1G) p(G) .
p(a) +
p(a) = 1
PCC)
q(a) 48000 p(D1a) p(a)
=
P (D|G)P (G)
P (G|D) =
P (D|G)P (G) + P (D|G)P (G)
1
1 ◊ 8000
= 1 2
1 1
1◊ 8000 + 10≠6 ◊ 1 ≠ 8000
1
=1≠ ¥ 99%
126
A
/
/
/"In/
/
/ ,
P (A + B) = P (A) + P (B) ≠ P (AB)
*
⑪
• In general, for n mutually exclusive events
n
ÿ
P (A1 + A2 + · · · + An ) = P (Ai )
i=1
• A set of events is exhaustive if one has to be true: A1 + · · · + An = 1. Then:
n
ÿ
P (Ai ) = P (A1 + A2 + · · · + An ) = 1
i=1
Stochastic variables
• Often, we will measure numerical quantities (number of children, age of a
patient, etc.)
• Suppose a quantity X (number of children) takes a value x = 3. We can write
this as the binary event X3 and in general:
1/X2
A. p(y = p(Y
=
p(Xx
=
=
=
-
based on the two typographic attributes upperm and
X10 0=
C. p(y = 1|x̃2 = 1, x̃10 =10), =
P1X2
=
3
such that upperm corresponds to x̃2 = 0, 1 and mr/is i)
D. p(y = 1|x̃2 = 1, x̃10 = 0) = 0.298
1 X10=0ly= i) plX=
to x̃10 = 0, 1. Suppose the probability for each of the
= p(X2
=
,
configurations of x̃2 and x̃10 conditional on the copyist E. Don’t know.
y are as given in Table 1. and the prior probability of 0)
1 x
=
11X2 = 0
p(y
+
=
1 X 0(y 1)p(y 1) =
p(X
=
=
=
0
,
+
p(Xz
=
1
,
x +
0
=
1 X
=
0
)(X2 =
0
I
+
31
0(y y)p(y 3)
Z p(x
= =
1 x
=
=
= + 0
,
.
I = 1
36 DTU Compute Lecture 3 12 September, 2023
The problem is solved by a simple application of The values of p(y) are given in the problem text and
Bayes’ theorem: the values of p(x̃2 = 1, x̃10 = 0|y) in ??. Inserting the
values we see option D is correct.
p(y = 1|x̃2 = 1, x̃10 = 0)
p(x̃2 = 1, x̃10 = 0|y = 1)p(y = 1)
= P3
k=1 p(x̃2 = 1, x̃10 = 0|y = k)p(y = k)
Independence
Expectations
N
ÿ
Expectation: E[f ] = f (xi )p(xi ). (2)
i=1
N
ÿ N
ÿ
mean: E[x] = xi p(xi ), Variance: Var[x] = (xi ≠ E[x])2 p(xi ). (3)
i=1 i=1
Example: Uniform probability
1
p(xi ) =
N
1 ÿN
E[f ] = f (xi )
N i=1
qN
i=1 xi
E[x] =
N
1 ÿN
Var[x] = (xi ≠ E[x])2
N i=1
39 DTU Compute Lecture 3 12 September, 2023
osvgp-26
1≠b
Bernoulli distribution: p(b|◊) = ◊ (1 ≠ ◊) b
.
! x
0 :(
-
0(0) 1 a 1 0
p(b
-
= -
=
=
+
0 0)+
-
p(b
=
1
10) =
(1 -
=
0
N
Ÿ N
Ÿ qN qN
p(b1 , · · · , bN |◊) = p(bi |◊) = ◊bi (1 ≠ ◊)1≠bi = ◊ b
i=1 i (1 ≠ ◊)N ≠ b
i=1 i
i=1 i=1
bi 1
= ◊m (1 ≠ ◊)N ≠m , m = b1 + b2 + · · · + bN
=
b2 =
0
b3 =
1
by = 1
bi =
0
42 DTU Compute Lecture 3 12 September, 2023
osvgp-40
I
"
! i
prediction
&* 0 75 &* 0 5
probability
=
=
. .
I
for
*
a new point O
-
p(bx /0 ) *
f101 = p(b1 , . . . , bN |◊) = ◊m (1 ≠ ◊)N ≠m