Lecture 3 Annotated

02450: Introduction to Machine Learning and Data Mining
Measures of similarity, summary statistics and probabilities
Georgios Arvanitidis
DTU Compute, Technical University of Denmark (DTU)

Today
Feedback Groups of the day: Reading material:
Lukas Blazek, Ebbe Abildgaard Bliksted, Villiam Bo,
Anna Keisha Mercy Kyei Boateng, Oskar Boesch, Chapter 4, Chapter 5
Rodrigo Bonmati Salazar, Jonas Vesterager Borge,
Anna-Sofie Bøgegaard Børresen, Adrià Bosch,
Rares-Victor Botis, Astrid Lind Bouquin, Ahmed
Mansour Bourassine, Amir Bouslama, Niels Moll
Bown, Andreas Boye, Veda Bradley, Jacob Høberg
Brams, Jonas Branner, Max Brazhnyy, Saxe
Brochmann Brøten, William Bruce, Emil Wraae
Carlsen, Gonçalo Carvalho, Beatriz Braga de
Carvalho, gaia casotto, Mohamed Chakrouni, Yue
Chang, Lola Maud R Charles, Valentine Chassagne,
Olivia Wenya Chen, Xiao Chen, Xi Chen, Hongjin
Chen, Tongxi Chen, Ting-Hui Cheng, Mark Chew,
Shu-Han Chien, Shirisha Reddy Chintaguntla, Jia Wei
Nelson Choo, Karl Ursø Christensen, Sif Egelund
Christensen, Emma Klostergaard Christensen, Laura
Emilie Jul Christiansen, Pernille Christie, Oscar
Flensburg Clausen, Estel Clua i Sánchez, José Dinis
Coelho Da Silva Ferreira, Robert Corneliu Coman,
Alexandre Comas Gispert, Riccardo Conti, Euan
Thomas Cortes, Luísa Maria Coimbra Cortes,
Geoffrey Clément Benoît Côte, Tallula Catherine
Crean, Marc Cummings, Afonso Martins dos Reis
Filipe Cunha, Lukas Samuel Czekalla, Frederik
Danielsen, Lasse Schnell Danielsen
2 DTU Compute Lecture 3 12 September, 2023

Lecture Schedule
1 Introduction 8 Artificial Neural Networks and
29 August: C1 Bias/Variance
Data: Feature extraction, and visualization 24 October: C14, C15
2 Data, feature extraction and PCA 9 AUC and ensemble methods
5 September: C2, C3 31 October: C16, C17
3 Measures of similarity, summary Unsupervised learning: Clustering and density estimation
statistics and probabilities 10 K-means and hierarchical clustering

12 September: C4, C5 7 November: C18
4 Probability densities and data 11 Mixture models and density estimation
visualization 14 November: C19, C20 (Project 2 due before 13:00)
19 September: C6, C7 12 Association mining
Supervised learning: Classification and regression 21 November: C21
5 Decision trees and linear regression Recap
26 September: C8, C9 13 Recap and discussion of the exam

6 Overfitting, cross-validation and 28 November: C1-C21
Nearest Neighbor
3 October: C10, C12 (Project 1 due before 13:00)
7 Performance evaluation, Bayes, and
Naive Bayes
10 October: C11, C13
Online help: Discussion Forum (Piazza) on DTU Learn

Videos of lectures: https://panopto.dtu.dk
Streaming of lectures: Zoom (link on DTU Learn)
Learning Objectives
• Compute measures of similarity/dissimilarity (Lp distance, cosine distance, etc.)
• Understand basics of probability theory and stochastic variables in the discrete
setting
• Understand probabilistic concepts such as expectations, independence and the
Bernoulli distribution
• Understand the maximum likelihood principle for repeated binary events

osvg-6
PCA recap: Principal component analysis on images

osvg-8
Pre-processing
row
datpoint
each ,
coints
[Jazzos
in
~ ..
...
==
-
:
.
I
avodinacions
/
↑ - -
~ W
s
-
L W
1000
osvg-9
x
d
Vb
Principal component analysis (PCA)
V2 -
......, -
4
.
+
...... x
=
..... -
⑧
V(x H) I
U
: ·
VI
b
-
=
............
1
·
&
" >
......
'
5 =
(x 1) V(x)
-
= xV(x)el12"
1x K IxM MxK
x =
V,. b +
+
kxt
kxt
-
Mxk
viafrit 4) , ...,
Mxk
z =
/w
osvg-10
PCA on face images

osvg-11
PCA on face images

osvg-12 1
~
⑳
I
↑ ·
PCA on face images ...

7
randomly v v
reconstruct
selectedt we col
poin the subspace
it on

osvg-14

osvg-15
VI
1
.
-
/
↑
.
.
.
"
Bata lie in 1123

but on a
plane
if we use a
Vector (1-dim
subspace)
and project
the data there
we cannot
reconstruct
them perfectly .

osvg-16
Similarity / Dissimilarity measures
Similarity s(x, y) Often between 0 and 1. Higher means more similar

Dissimilarity d(x, y) Greater than 0. Lower means more similar.
Classification Classify a document as having the same topic y as the
document it is most similar/least dissimilar to.
Outlier detection The observation most dissimilar to all other observations
is an outlier

osvg-17
Dissimilarity measures
1q 2 p1
• General Minkowsky distance (p-distance) dp (x, y) = M
j=1 |xj ≠ yj |p
• One-norm (p = 1) with y
=
[8]
color(x) = red if
M
ÿ di(x y)[1
d1 (x, y) =
,
|xj ≠ yj | ele
color(x)
=
blue
j=1
• Euclidean (p = 2) Similar with
ı̂ M dz(x y) ,
= 1
ıÿ and
d2 (x, y) = Ù (xj ≠ yj )2 doo(X y) ,
=1
j=1
• Max-norm distance (p = Œ)
dŒ (x, y) = max {|x1 ≠ y1 |, |x2 ≠ y2 |, . . . , |xM ≠ yM |}
Usage: Regularization and alternative optimization targets. For instance,

p = Œ is very affected by outliers, p = 1 much less so.

osvg-20
Similarity measures
fitfoot fot fol
Simple Matching Coefficient (SMC) fro : xx =1 , xx
=
0 k =
1
for :
X
=
0
, xk
=
f00 + f11
SMC(x, y) =
K
Jaccard Coefficient
f11
J(x, y) =
K ≠ f00
Cosine similarity
x€ y
cos(x, y) =
ÎxÎÎyÎ
Extended Jaccard coefficient
x€ y
EJ(x, y) =
ÎxÎ2 + ÎyÎ2 ≠ x€ y

osvg-22
Quiz 1, similarity measures

Calculate the simple matching coefficient, Jaccard, Which of the following statements are true?
cosine and extended jaccard similarity between cus-
tomer 1 and customer 2 in the market basket data O
A. SMC(o1 , o2 ) = 35 J(o1 , o2 ) = 12 , cos(o1 , o2 ) =O
2
3,
below. q
SMC(x Y) ,3/5=
B. SMC(o1 , o2 ) = 3
J(o1 , o2 ) = 34 , cos(o1 , o2 ) = 2
3,
foot
5
= +z
=
X
2 1 2
fr =
~
2
S(x y) , 1
11 + 11
0 0 01
C. SMC(o1 , o2 ) = 5 J(o1 , o2 ) = 3 , cos(o1 , o2 ) = 3 ,
q
Yzxy
. +
+
k 3 1 0 +
9 2 1 2
-D. SMC(o1 , o2 ) = 5 J(o1 , o2 ) = 3 , cos(o1 , o2 ) = 3,
= .
=
cos(x y)
V-
=
X1 1
,
11x11z
= =
+ 1 +
E. Don’t know.

The problem is easily solved by using the inserted data is binary, the extended Jaccard and the jaccard
formula. We obtain: SMC(o2 , o2 ) = 35 J(o1 , o2 ) = 12 , coefficient agree.
cos(o1 , o2 ) = 23 and therefore the A is true. Since the

osvg-25
Invariance
Scale invariance
d(x, y) = d(–x, y)
Translation invariance
d(x, y) = d(– + x, y)
General invariances

osvg-28 * * 3 dz(X =
,
Xz) =
10
I I ↓
dz(X+ x)) 1 000
Transformations
=
30
.
40 40 ,
50 .
000 50 000 ,
49 .
000
does this represent
a
?
12
reasonable" distance
Standardization: Ensure a single attribute will not dominate:
xik ≠ µ̂k
x̃ik =
ˆk
‡
Combining heterogeneous attributes Transform measures and combine

-
sEdu. = SMC(xEdu. , yEdu. )

sAge. = a (a + d1 (xAge. , yAge. ))≠1 , a = 1
&
1
s(x, y) = (sEdu. + sAge. )
2
-
Weighting Attributes have different importance
s(x, y) = 0.99sEdu. + 0.01sAge.

Empirical statistics
Given two samples x1 , x2 , . . . , xN œ R and y1 , y2 , . . . , yN œ R:
• Empirical mean
1 ÿ
N
µ̂ = x̄ = xi
N i=1
• Empirical variance
1 ÿ
N
ŝ = var[x]
ˆ = (xi ≠ µ̂)2
N ≠ 1 i=1
• Empirical covariance
1 ÿ
N
cov[x,
ˆ y] = (xi ≠ µ̂x )(yi ≠ µ̂y )
N ≠ 1 i=1
• Empirical standard deviation

Ô
‡ ˆ
ˆ = std[x] = ŝ

osvg-24
Correlation
• Measure of degree of linear relationship
cov[x,
ˆ y]
cor[x,
ˆ y] =
ˆx ‡
‡ ˆy

quartiles median
Quantiles
v I'v kaz'xz
I
xaz xizi'a
Given N observations of an attribute x1 , x2 , . . . , xN œ R.
Quantiles describe the points that divide the underlying distribution into
intervals that are equally probable:
• The one 2-quantile (median) divides the distribution in two intervals.
• The three 4-quantiles (quartiles) divides the distribution in four intervals.
• The 99 100-quantiles (percentiles) divides the distribution in 100 intervals.
The median is the same as the 2nd quartile or the 50th percentile.
E.g., we can (approximately) find the median by

• Sort the observations in ascending order x(1) Æ x(2) Æ · · · Æ x(N )
Y
]x N +1
( ) if N is odd
median[x] = 1 1 2 2
[ x N +x N
2 ( ) ( +1) if N is even.
2 2

14 : 05
Probabilities
Pragmatically: A big part of AI is dealing with uncertainty and incomplete

information. Probabilities is the formal framework for doing
so.
Algorithmically: If an image belongs to a particular category is a discrete
event. The probability it belongs to a category is continuous.
Algorithmically, easier to optimize continuous quantities.
Convenience: There are boiler-plate ideas for transforming a probabilistic
assumption into an algorithm (maximum likelihood).

osvgp-2
Probabilities
We reason about a proposition A in light of evidence B:

P (A|B) = x
The degree-of-belief that A is true given B is accepted as true is at
a level x
• A number between 0 and 1
• A and B are always binary (true/false) propositions
• Represents a state of knowledge
osvgp-4
Probabilities: Trial example
G : The accused is guilty

E1 : A car similar to his was seen at the crime scene.
E2 : A large sum of money was found in his posession
E3 : His fingerprints was found at the door of the bank.
Probabilities express states-of-knowledge
E © E1 and E2 and E3
P (G|E) > P (G|E2 )
osvgp-6
Binary propositions
A binary proposition is a statement which is either true or false (we might
not know, but someone with complete knowledge would)
A : In 49 BCE, Caesar crossed the Rubicon
B : Acceleration sensor 39 measures more than 0.85
C : Patient 901 has high cholesterol
Propositions can be combined with and, or and not:
AB © True if A and B are both true Ar
A + B © True if either A or B are true Au
-
A © True if A is false
We define two special propositions which is always true/false:
-
1 : A proposition which is always true

-
0 : A proposition which is always false

...and the following identities:A1 = A, A + A = 1, A = A and
A(B1 + B2 + · · · + Bn ) = AB1 + AB2 + · · · + ABn
osvg-p5
Quiz 2, Probabilities
Assume we define the following 4 boolean variables. A. P (R1 R2 R3 |F ) > 0.9
R1 : Handed in report 1 B. P (F |R1 + R2 + R3 ) > 0.9

F : passed R2 : Handed in report 2 C. P (F |R1 R2 R3 ) > 0.9
R3 : Handed in report 3
R1R2R3 D. P (R1 + R2 + R3 |F ) > 0.9
F : Student failed 02450
E. Don’t know.
How would you express the probability of the state-
p(F)R1R2Rs)
ment: > 0 .
9
If a student hand in report 1, 2 and 3, the
chance of passing 02450 is greater than 90%?

Passing can be defined as not failing. Therefore, namely, if it is true report 1, 2 and 3 are all handed
express the statement as: in, what is the chance of passing?
P (F |R1 R2 R3 ) > 0.9

osvgp-6b
Rules of probability graus * is

A A
The sum rule: P (A|C) + P (A|C) = 1

The product rule: P (AB|C) = P (B|AC)P (A|C)
Interpretation: P(ABC) P(BIC)

=
P (A|B) = 0 (interpretation: given B is true, A is certainly false)

P (A|B) = 1 (interpretation: given B is true, A is certainly true)
We also use the shorthand:
P (A|1) = P (A)
Remarkably, this is the mathematical basis for this course

osvgp-8
Marginalization and Bayes’ theorem
②
=
P(AIBC) P(BI)
·
① ② ②
Ë n e eÈ a
m
P (B|C) = P (B|C) P (A|BC) + P (A|BC) = P (AB|C) + P (AB|C)

= P (B|AC)P (A|C) + P (B|AC)P (A|C).
-
③ P (B|AC)P (A|C)
P (A|BC) =
P (B|C)
P (B|AC)P (A|C)
= .
P (B|AC)P (A|C) + P (B|AC)P (A|C)
P(BIAC P(AIC8
P(ABIC)
P(BIAC)P(AIC)]
P(ABC) P(BIC)
P(ABC) P(BIC)
= ·

osvgp-10
⑪
DNA
Crimes may be solved by matching crime-scene DNA to DNA in a database

• If the two samples are from the same person, a DNA test will always give a
positive match
P(D1G) 1 =
• If the DNA are from different persons, DNA will incorrectly give a positive match
one time out of a million 1 e6
↑(D1E)
-
=
A crime is committed in Racoon City by an unidentified male. Assume all 8000

possible perpetrators undergo a DNA test, and suppose the DNA test gives a
positive result for George. What is the chance George is guilty?
G : George is guilty, D : There was a positive DNA match
p(G1D) =
P(D1G) p(G) .
p(a) +
p(a) = 1
PCC)
q(a) 48000 p(D1a) p(a)
=
p(p) p(D1G) p(a) .

.
= +

Solution:
P (D|G)P (G)
P (G|D) =
P (D|G)P (G) + P (D|G)P (G)
1
1 ◊ 8000
= 1 2
1 1
1◊ 8000 + 10≠6 ◊ 1 ≠ 8000
1
=1≠ ¥ 99%
126

osvgp-11a
Exclusive and exhaustive events
A1 : The side face up. A2 : The side face up.

• When no two propositions can be true at the same time, they are said to be
mutually exclusive: Ai Aj = 0 for i ”= j p(AiAj) 0
=>
=
• Consider any two events A and B
A
/
/
/"In/
/
/ ,
P (A + B) = P (A) + P (B) ≠ P (AB)
*
⑪
• In general, for n mutually exclusive events
n
ÿ
P (A1 + A2 + · · · + An ) = P (Ai )
i=1
• A set of events is exhaustive if one has to be true: A1 + · · · + An = 1. Then:
n
ÿ
P (Ai ) = P (A1 + A2 + · · · + An ) = 1
i=1

Exclusive and exhaustive events

• When no two propositions can be true at the same time, they are said to be
mutually exclusive: Ai Aj = 0 for i ”= j
• Consider any two events A and B
P (A + B) = 1 ≠ P (A B) =1 ≠ P (A|B)P (B)
# $
= 1 ≠ 1 ≠ P (A|B) P (B) =P (B) + P (AB)
= P (B) + P (B|A)P (A) =P (B) + [1 ≠ P (B|A)] P (A)
= P (A) + P (B) ≠ P (AB).
qn
• In general, for n mutually exclusive events P (A1 + A2 + · · · + An ) = i=1 P (Ai )
• A set of events is exchaustive if one has to be true: A1 + · · · + An = 1. Then:
n
ÿ
P (Ai ) = P (A1 + A2 + · · · + An ) = 1
i=1

osvgp-24
Stochastic variables
• Often, we will measure numerical quantities (number of children, age of a
patient, etc.)
• Suppose a quantity X (number of children) takes a value x = 3. We can write
this as the binary event X3 and in general:
Xx : {The binary event that X is equal to the number x}
• Stochastic variable simplify this notation by the definition:

osvgp-quiz4
Quiz 3, Avila bible (Fall 2018)

p(x̃2 , x̃10 |y) y=1 y=2 y=3 the copyists is
x̃2 = 0, x̃10 = 0 0.19 0.3 0.19
x̃2 = 0, x̃10 = 1 0.22 0.3 0.26 p(y = 1) = 0.316, p(y = 2) = 0.356, p(y = 3) = 0.328.
x̃2 = 1, x̃10 = 0 0.25 0.2 0.35
Using this, what is then the probability an observation
x̃2 = 1, x̃10 = 1 0.34 0.2 0.2
was authored by copyist 1 given that x̃2 = 1 and
Table 1: Probability of observing particular values of x̃2 x̃10 = 0? 0=
1 X10
=
1/X2
A. p(y = p(Y
=
and x̃10 conditional on y. =

,
1|x̃2 = 1, x̃10 = 0) = 0.25

1)
1 X10 0(y 1)-p(y
We will consider a dataset based on the Avila bible. =
p(Xx
=
=
=
We wish to predict the copyist (y = 0, 1, 2) of a bible B. p(y = 1|x̃2 = 1, x̃10 = 0) = 0.313

,
-
based on the two typographic attributes upperm and
X10 0=
C. p(y = 1|x̃2 = 1, x̃10 =10), =
P1X2
=
mr/is. We suppose the attributes have been binarized 0.262 =
3
such that upperm corresponds to x̃2 = 0, 1 and mr/is i)
D. p(y = 1|x̃2 = 1, x̃10 = 0) = 0.298
1 X10=0ly= i) plX=
to x̃10 = 0, 1. Suppose the probability for each of the
= p(X2
=
,
configurations of x̃2 and x̃10 conditional on the copyist E. Don’t know.
y are as given in Table 1. and the prior probability of 0)
1 x
=
11X2 = 0
p(y
+
=
1 X 0(y 1)p(y 1) =
p(X
=
=
=
0
,
+
p(Xz
=
1
,
x +
0
=
1 X
=
0
)(X2 =
0
I
+
31
0(y y)p(y 3)
Z p(x
= =
1 x
=
=
= + 0
,
.
I = 1
The problem is solved by a simple application of The values of p(y) are given in the problem text and
Bayes’ theorem: the values of p(x̃2 = 1, x̃10 = 0|y) in ??. Inserting the
values we see option D is correct.
p(y = 1|x̃2 = 1, x̃10 = 0)
p(x̃2 = 1, x̃10 = 0|y = 1)p(y = 1)
= P3
k=1 p(x̃2 = 1, x̃10 = 0|y = k)p(y = k)

osvgp-20
Independence
Independent: p(xi , yj ) = p(xi )p(yj )

Conditionally independent given zk : p(xi , yj |zk ) = p(xi |zk )p(yj |zk )

osvgp-22
Expectations
N
ÿ
Expectation: E[f ] = f (xi )p(xi ). (2)
i=1
N
ÿ N
ÿ
mean: E[x] = xi p(xi ), Variance: Var[x] = (xi ≠ E[x])2 p(xi ). (3)
i=1 i=1
Example: Uniform probability
1
p(xi ) =
N
1 ÿN
E[f ] = f (xi )
N i=1
qN
i=1 xi
E[x] =
N
1 ÿN
Var[x] = (xi ≠ E[x])2
N i=1
osvgp-26

The Bernoulli distribution
• Let b = 0, 1 denote a binary event.

• For instance,
• b = 0 corresponds to heads, and b = 1 to tails, or
• b = 0 corresponds to a person being ill, and b = 1 that a person is well.
• The probability of b is expressed using a parameter ◊ in the unit interval [0, 1]
-
1≠b
Bernoulli distribution: p(b|◊) = ◊ (1 ≠ ◊) b
.
! x
0 :(
-
0(0) 1 a 1 0
p(b
-
= -
=
=
+
0 0)+
-
p(b
=
1
10) =
(1 -
=
0

osvgp-28
The Bernoulli distribution, repeated events
• Suppose we observe a sequence b1 , . . . , bN of Bernoulli (binary) events.

• For instance, for N patients we record whether person 1 is ill or well (b1 = 0 or
b1 = 1) and up to whether patient N is ill or well (bN = 0 or bN = 1)
• When we know ◊ (the chance a person is well or ill), the events are independent
Bernoulli distribution: p(b|◊) = ◊b (1 ≠ ◊)1≠b .
N
Ÿ N
Ÿ qN qN
p(b1 , · · · , bN |◊) = p(bi |◊) = ◊bi (1 ≠ ◊)1≠bi = ◊ b
i=1 i (1 ≠ ◊)N ≠ b
i=1 i
i=1 i=1
bi 1
= ◊m (1 ≠ ◊)N ≠m , m = b1 + b2 + · · · + bN
=
b2 =
0
b3 =
1
by = 1
bi =
0
osvgp-40
The Bernoulli distribution, maximum likelihood

fro
S
I
"
! i
prediction
&* 0 75 &* 0 5
probability
=
=
. .
I
for
*
a new point O
-
p(bx /0 ) *
f101 = p(b1 , . . . , bN |◊) = ◊m (1 ≠ ◊)N ≠m
An idea for selecting ◊ú is Maximum likelihood
◊ú = arg max p(b1 , . . . , bN |◊)

◊
The value of ◊ according to which the data is most plausible

Resources
https://bayes.wustl.edu Classical textbook which treats probabilities as

states-of-knowledge and discuss many practical and
philosophical issues (this book converted me to ML!)
(https://bayes.wustl.edu/etj/prob/book.pdf)
https://02402.compute.dtu.d A more in-depth description of summary

statistics (see chapter 1) (https://02402.compute.dtu.dk)
https://www.khanacademy.org An excellent introduction to probability
theory which we recommend as a go-to resource
(https://www.khanacademy.org/math/statistics-probability/probability-library)
http://citeseerx.ist.psu.edu A more in-depth discussion of Bayes in the

court room (http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=
EFE0328140036A4C668BB5B9FC76C9BE?doi=10.1.1.599.8675&rep=rep1&type=pdf)

Lecture 3 Annotated

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 3 Annotated

Uploaded by

Copyright:

Available Formats

02450: Introduction to Machine Learning and Data Mining

Measures of similarity, summary statistics and probabilities

DTU Compute, Technical University of Denmark (DTU)

2 DTU Compute Lecture 3 12 September, 2023

statistics and probabilities 10 K-means and hierarchical clustering

5 Decision trees and linear regression Recap

26 September: C8, C9 13 Recap and discussion of the exam

Online help: Discussion Forum (Piazza) on DTU Learn

4 DTU Compute Lecture 3 12 September, 2023

PCA recap: Principal component analysis on images

5 DTU Compute Lecture 3 12 September, 2023

PCA on face images

8 DTU Compute Lecture 3 12 September, 2023

PCA on face images

9 DTU Compute Lecture 3 12 September, 2023

PCA on face images ...

10 DTU Compute Lecture 3 12 September, 2023

11 DTU Compute Lecture 3 12 September, 2023

Bata lie in 1123

12 DTU Compute Lecture 3 12 September, 2023

Similarity / Dissimilarity measures

Similarity s(x, y) Often between 0 and 1. Higher means more similar

13 DTU Compute Lecture 3 12 September, 2023

• Euclidean (p = 2) Similar with

dŒ (x, y) = max {|x1 ≠ y1 |, |x2 ≠ y2 |, . . . , |xM ≠ yM |}

Usage: Regularization and alternative optimization targets. For instance,

14 DTU Compute Lecture 3 12 September, 2023

15 DTU Compute Lecture 3 12 September, 2023

Quiz 1, similarity measures

16 DTU Compute Lecture 3 12 September, 2023

17 DTU Compute Lecture 3 12 September, 2023

18 DTU Compute Lecture 3 12 September, 2023

Combining heterogeneous attributes Transform measures and combine

sEdu. = SMC(xEdu. , yEdu. )

Weighting Attributes have different importance

s(x, y) = 0.99sEdu. + 0.01sAge.

19 DTU Compute Lecture 3 12 September, 2023

• Empirical standard deviation

20 DTU Compute Lecture 3 12 September, 2023

• Measure of degree of linear relationship

21 DTU Compute Lecture 3 12 September, 2023

E.g., we can (approximately) find the median by

22 DTU Compute Lecture 3 12 September, 2023

Pragmatically: A big part of AI is dealing with uncertainty and incomplete

23 DTU Compute Lecture 3 12 September, 2023

We reason about a proposition A in light of evidence B:

Probabilities: Trial example

G : The accused is guilty

1 : A proposition which is always true

0 : A proposition which is always false

R1 : Handed in report 1 B. P (F |R1 + R2 + R3 ) > 0.9

27 DTU Compute Lecture 3 12 September, 2023

28 DTU Compute Lecture 3 12 September, 2023

Rules of probability graus * is

The sum rule: P (A|C) + P (A|C) = 1

Interpretation: P(ABC) P(BIC)

P (A|B) = 0 (interpretation: given B is true, A is certainly false)

We also use the shorthand:

Remarkably, this is the mathematical basis for this course

29 DTU Compute Lecture 3 12 September, 2023

Marginalization and Bayes’ theorem

P (B|C) = P (B|C) P (A|BC) + P (A|BC) = P (AB|C) + P (AB|C)

30 DTU Compute Lecture 3 12 September, 2023

Crimes may be solved by matching crime-scene DNA to DNA in a database