Professional Documents
Culture Documents
KD Iii 4 Kernel SVM 1415
KD Iii 4 Kernel SVM 1415
Kernels / SVM 6
Prof. Dr. Rudi Studer, Dr. Achim Rettinger*, Dipl.-Inform. Lei Zhang
{rudi.studer, achim.rettinger, l.zhang}@kit.edu
2 Institut AIFB
Chapter 3.3
4 Institut AIFB
Recap: Linear Classification (Perceptron)
5 Institut AIFB
Recap: Basis expansion
M
X1
Linear model: f (xi , w) = w0 + wj xi,j
j=1
j (x) = x1 , x2 , ..., xM 1
j (x) = 2
x1 , x1 x2
p
j (x) = log(x1 ), x1
8 Institut AIFB
Chapter 3.3.a
Kernel Functions
9 Institut AIFB
Nonlinearity through Feature Maps
10 Institut AIFB
Reminder: “Perceptron Training” Algorithm
12 Institut AIFB
Dual version of “Perceptron Training” algorithm
rewritten
rewritten
Observations:
§ information on pairwise inner products is
sufficient for training and classification
§ feature mappings enter computation only
through inner products;
13 Institut AIFB
Kernel Functions
14 Institut AIFB
Example: Polynomial Kernel of Degree 2
15 Institut AIFB
Example: Polynomial Kernel of Degree 2
"Kernel Trick"
Feature Mapping
16 Institut AIFB
Mathematical Properties of Kernels
Core insights:
17 Institut AIFB
Mathematical Properties of Kernels (cont)
18 Institut AIFB
Kernel Function Design
Case II: Design a similarity function directly on the input data and
check whether it conforms to a valid kernel function
19 Institut AIFB
Example: Gaussian Kernel
20 Institut AIFB
Example: Gaussian Kernel
Examples:
String Kernels
[e.g. Lodhi et al., "Text Classification Using String Kernels", 2002]
Tree Kernels
[e.g. Collins & Duffy, "Convolution Kernels for Natural Language", 2001]
Graph Kernels
[e.g. Gärtner et al., "On Graph Kernels: Hardness Results and Efficient Alternatives", 2003]
Many more…
[Gärtner, "A Survey of Kernels for Structured Data", 2003]
22 Institut AIFB
Summary: Kernel Methods (remember this !)
Linear Model: Dual Representation:
Classification = side of hyperplane During learning and classification,
Training = estimation of "good" it is sufficient to access only dot
parameters w and b products of data points
24 Institut AIFB
Summary: Kernel Methods (remember this !)
"Kernel Trick":
Rewrite learning algorithm such that any reference to the input data
happens from within inner products
Replace any such inner product by the kernel function of your
choice.
Work with the (linear) algorithm as usual
Non-linearity enters learning through kernels while training
algorithm may remain linear.
25 Institut AIFB
Chapter 3.3.b
26 Institut AIFB
Support Vector Machines
27 Institut AIFB
Margin Maximization: Intuition
28 Institut AIFB
Margin Maximization
29 Institut AIFB
Support Vector Machines
30 Institut AIFB
SVM Optimization Problem
Idea: Fix the norm of the weight vector and maximize the functional
margin.
Equivalently, if we fix the functional margin to 1, the geometric margin
equals 1/||w||
Thus: maximizing (geometric) margin = minimizing the norm of the
weight vector while keeping functional margin above a fixed value
31 Institut AIFB
SVM Optimization Problem (dual)
33 Institut AIFB
Dealing with Noise: Soft Margins
34 Institut AIFB
Dealing with Noise: Soft Margins
35 Institut AIFB
Dealing with Noise: Soft Margins (dual)
36 Institut AIFB
Summary: SVMs (remember this !)
38 Institut AIFB
Big picture: Kernels and SVMs
1. Think whether you could work with linear techniques if you map
your data into a richer (higher dimensional) space
Kernel Functions
2. Implement learning algorithm in such a way that reference to the
SVMs
mapped data is made only within pairwise dot products.
3. Pairwise dot products can be efficiently computed by means of a
corresponding kernel function on the original data items.
4. Make the SVM training maximize the margin of the training data in
SVMs
the implicit feature space to maximize generalization performance.
39 Institut AIFB
There is more behind Kernels and SVMs
Learning
Kernel Algorithm
Function
Data Classification: SVM
(a)…
(Ranking)
(b)…
(Regression)
(Clustering)
40 Institut AIFB
Example: Graph kernel
skos:prefLabel
„Machine
topic110 person100
Learning“
foaf:knows
foaf:topic_interest foaf:gender
41 Institut AIFB
The Learning Tasks (I)
skos:prefLabel
„Machine
topic110 person100
Learning“
foaf:knows
foaf:topic_interest foaf:gender
?
„Jane Doe“ foaf:name person200 foaf:gender „female“
42 Institut AIFB
The Learning Tasks (II)
„Machine skos:prefLabel
? foaf:topic_interest
topic110 person100
Learning“
foaf:knows
foaf:topic_interest foaf:gender
… link prediction, …
43 Institut AIFB
The Learning Tasks (III)
skos:prefLabel
„Machine
topic110 person100
Learning“
foaf:knows
foaf:topic_interest ? foaf:gender
… clustering,…
44 Institut AIFB
The Kernel Trick
Any RDF graph
Any Kernel
Machine
(SVM / SVR /
Kernel k-means)
45 Institut AIFB
Intersection Graph
Input
Instance graph
Entity e1 Instance
RDF
extraction
data G(e1) G(e2)
graph Entity e2
Inter-
section
46 Institut AIFB
Instance graph
skos:prefLabel
„Machine
topic110 person100
Learning“
foaf:knows
foaf:topic_interest foaf:gender
47 Institut AIFB
Instance graph - Example
person
„Jane Doe“
„Jane Doe“ foaf:name person200 „female“
200
person
„Jane Doe“
„Jane Doe“ person200 „female“
„female“
foaf:name
200
48 Institut AIFB
Intersection Graph
Input
Instance graph
Entity e1 Instance
RDF
extraction
data G(e1) G(e2)
graph Entity e2
Inter-
section
49 Institut AIFB
Intersection graph
person
„Jane Doe“
„Jane Doe“ person200 „female“
foaf:name
200
50 Institut AIFB
Intersection Graph
Input
Instance graph
Entity e1 Instance
RDF
extraction
data G(e1) G(e2)
graph Entity e2
Inter-
section
51 Institut AIFB
Feature count
E0 ✓ E
V0 = {v | 9u, p : (u, p, v) 2 E 0 _ (v, p, u) 2 E 0 }
…qualifies as a candidate feature set
Edges
Walks/Paths up to a length of an arbitrary l (2)
Connected edge-induced subgraphs
52 Institut AIFB
Excursus: String Kernels
6
(a) Extracted Entities (b) Entity Similarity (c) Entity Clu
Observation Structural Similarity:
DE
type Intersection Graphs
Entity e1
geo
tec0001 unit MIO_EUR Ge1 ∩ Ge2
indicator time value Observation MIO_EUR tec0001
“2496200.0“
B11 “2010-01-01“
“2010-01-01“ DE
Src.
Src. 11
Observation
type DE Ge1 ∩ Ge3
Entity e2
geo Observation
gov_q_ggdebt unit MIO_EUR
DE
indicator value
“1786882.0“ “2010-01-01“ gov_q_ggdebt
time
F2 “2010-01-01“ Src.
Src. 22
Literal Similarity
Observation e1 vs. e2
type
Entity e3
54 Fig. Knowledge
2: (a)Discovery
Extracted entities e1 , e2 , and e3 from Fig. 1. (b) Structural
Institut AIFB sim
as intersection graphs Ge1 \ Ge2 , and Ge1 \ Ge3 . Literal similarities f
e vs. e , and e vs. e . Note, exact matching literals are omitted, as
Note, [16] introduced further kernels, however, we found path kernels to be
simple, and perform well in our experiments, cf. Sect. 6.
Example. Extracted entities from sources in Fig. 1 are given in Fig. 2-a. In
Excursus: Stringthe
Fig. 2-b, we compare Kernels
structure of tec0001 (short: e1 ) and gov q ggdebt
(short: e2 ). For this, we compute an intersection: Ge1 \ Ge2 . This yields a set of
4 paths, each with length 0. The unnormalized kernel value is 0 · 4.
Strings
For literalare also a one
similarities, relational representation:
can use di↵erent a linear
kernels on, e.g., stringsgraph
or num-
(sequence
bers of reasons,
[18]. For space nodes)we restrict presentation to the string subsequence
kernel [15]. However, a numerical kernel is outlined in our extended report [19].
Definition 7 (String Subsequence Kernel l ). Let ⌃ denote a vocabulary
for strings, with each string s as finite sequence of characters in ⌃. Let s[i :
j] denote a substring si , . . . , sj of s. Further, let u be a subsequence of s, if
indices i = (i1 , . . . , i|u| ) exist with 1 i1 i|u| |s| such that u = s[i].
The length l(i) of subsequence u is i|u| i1 + 1. Then, a kernel function l is
l
defined
P P as sum Pover all common, weighted subsequences for strings s, t: (s, t) =
l
u i:u=s[i] j:u=t[j] (i) l (j), with as decay factor.
7
Example. For instance, strings “MI” and “MIO EUR” share a common sub-
sequence “MI” with i = (1, 2). Thus, the unnormalized kernel is 2 + 2 .
As l is only defined for two strings, we sample over every possible string
pair for two entities, and aggregate the needed kernel for each pair. Finally, we
aggregate kernels s and l , resulting in a single kernel [18]:
M
0 00 s
55 Knowledge Discovery
(e , e ) = (Ge0 , Ge00 ) l (s, t) Institut AIFB
56 Institut AIFB