Professional Documents
Culture Documents
1 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Topics
Introduction
Distance Measures
Similarity Measures
Similarity Queries
Evaluation
Final Remarks
2 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Topics
Introduction
Distance Measures
Similarity Measures
Similarity Queries
Evaluation
Final Remarks
3 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
• Distance / Dissimilarity
• Quantify the difference of two objects
• The value is usually in the interval [0, ∞]
• Lower values mean that the objects are more similar
• Similarity
• Quantify the alikeness of two objects
• The value is usually in the interval [0, 1]
• Lower values mean that the objects are less similar
4 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Motivation
5 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
B C
(0, 1) (1, 1)
(0, 0)
A
6 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
7 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
8 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
9 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
10 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Topics
Introduction
Distance Measures
Similarity Measures
Similarity Queries
Evaluation
Final Remarks
11 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Euclidean Distance
B C
(0, 1) (1, 1)
(0, 0)
A
12 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
p
d(p, q) = (p.x − q.x)2 + (p.y − q.y)2
B C
(0, 1) (1, 1)
(0, 0)
A
p
d(A, B) = (A.x − B.x)2 + (A.y − B.y)2 (ED1)
13 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
p
d(p, q) = (p.x − q.x)2 + (p.y − q.y)2
B C
(0, 1) (1, 1)
(0, 0)
A
p
d(A, B) = (A.x − B.x)2 + (A.y − B.y)2 (ED1)
p
d(A, B) = (0 − 0)2 + (0 − 1)2 (ED2)
14 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
p
d(p, q) = (p.x − q.x)2 + (p.y − q.y)2
B C
(0, 1) (1, 1)
(0, 0)
A
p
d(A, B) = (A.x − B.x)2 + (A.y − B.y)2 (ED1)
p
d(A, B) = (0 − 0)2 + (0 − 1)2 (ED2)
p
d(A, B) = (0)2 + (1)2 = 1 (ED3)
15 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
p
d(p, q) = (p.x − q.x)2 + (p.y − q.y)2
B C
(0, 1) (1, 1)
(0, 0)
A
p
d(A, C) = (A.x − C.x)2 + (A.y − C.y)2 (ED4)
p
d(A, C) = (0 − 1)2 + (0 − 1)2 (ED5)
p √
d(A, C) = (−1)2 + (−1)2 = 2 (ED6)
16 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Manhattan Distance
• Absolute distance of the coordinates
• Also known as Taxicab Distance, City Block Distance...
B C
(0, 1) (1, 1)
(0, 0)
A
18 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
B C
(0, 1) (1, 1)
(0, 0)
A
19 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
B C
(0, 1) (1, 1)
(0, 0)
A
B C
(0, 1) (1, 1)
(0, 0)
A
22 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
23 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Chebyshev Distance
• Maximum difference in any coordinate
• Also know as Chessboard Distance
B C
(0, 1) (1, 1)
(0, 0)
A
25 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
B C
(0, 1) (1, 1)
(0, 0)
A
26 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
B C
(0, 1) (1, 1)
(0, 0)
A
Minkowski Distance
• Generalization for Euclidean/Manhattan/Chebyshev Distances
n
!1/e
X
e
d(p, q) = |pi − qi | (4)
i=1
• Manhattan (e=1)
n
!1/1 n
X X
1
d(p, q) = |pi − qi | = |pi − qi | (5)
i=1 i=1
• Euclidean (e=2)
!1/2 v
n
X
u n
uX
d(p, q) = |pi − qi |2 =t |pi − qi |2 (6)
i=1 i=1
28 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
29 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Levenshtein Distance
30 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
31 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
32 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Topics
Introduction
Distance Measures
Similarity Measures
Similarity Queries
Evaluation
Final Remarks
33 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
34 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
35 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
d(A, B)
sim(A, B) = 1 −
max(length(A), length(B)
37 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Jaccard Similarity
• Similarity between two finite sets
|A ∩ B|
• sim(A, B) =
|A ∪ B|
38 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
40 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Topics
Introduction
Distance Measures
Similarity Measures
Similarity Queries
Evaluation
Final Remarks
41 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Similarity Queries
• Range
• Given an element e and a minimum similarity score v return all
elements e0 such that sim(e, e0) < v
42 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Range Query
• Range
• Given an element e and a minimum similarity score v return all
elements e0 such that sim(e, e0) < v
c
e
v
b
d
a
43 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
k-Nn Query
• k-Nn
• Given an element e return the k most similar elements
c
e
k=2
b
d
a
44 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Topics
Introduction
Distance Measures
Similarity Measures
Similarity Queries
Evaluation
Final Remarks
45 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Precision @ Recall
• Wide-used evaluation technique
• Input:
• A query object q
• A set of objects O
• Let q = São Francisco do Sul be the query object.
• The other objects:
Object
São Francisco do Sul
São Franc S
San Francisco
Sao Francisco Sul
São Bento do Sul
São José
São Chico do Sul
São Caetano do Sul
São Paulo do Oeste
S Chico do Sul 46 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Object Relevant
São Francisco do Sul X
São Franc S X
San Francisco
Sao Francisco Sul X
São Bento do Sul
São José
São Chico do Sul X
São Caetano do Sul
São Paulo do Oeste
S Chico do Sul X
47 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
48 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
49 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
52 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
0.6
0.5
0.4
0.3
0.2 Levenshtein
0.1 Jaccard
0
0 0.2 0.4 0.6 0.8 1
Recall
53 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Topics
Introduction
Distance Measures
Similarity Measures
Similarity Queries
Evaluation
Final Remarks
54 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Final Remarks
55 / 56
Introduction Distance Measures Similarity Measures Similarity Queries Evaluation Final Remarks
Exercises
56 / 56