KD V 3 Relational Learning 1415

Knowledge Discovery WS 14/15
Relational Learning III 11

Prof. Dr. Rudi Studer, Dr. Achim Rettinger*, Dipl.-Inform. Lei Zhang, M.Sc Aditya Mogadala, M. Sc.
Steffen Thoma
{rudi.studer, achim.rettinger, l.zhang, aditya.mogadala, steffen.thoma}@kit.edu
INSTITUT FÜR ANGEWANDTE INFORMATIK UND FORMALE BESCHREIBUNGSVERFAHREN (AIFB)
KIT – University of the State of Baden-Württemberg and

National Laboratory of the Helmholtz Association www.kit.edu
Knowledge Discovery Lecture WS14/15
22.10.2014 Einführung
Basics, Overview
29.10.2014 Design of KD-experiments
05.11.2014 Linear Classifiers
12.11.2014 Data Warehousing & OLAP
19.11.2014 Non-Linear Classifiers (ANNs) Supervised Techniques,
26.11.2014 Kernels, SVM Vector+Label Representation
03.12.2014 entfällt
10.12.2014 Decision Trees
17.12.2014 IBL & Clustering Unsupervised Techniques
07.01.2015 Relational Learning I
Semi-supervised Techniques,
14.01.2015 Relational Learning II
Relational Representation
21.01.2015 Relational Learning III
28.01.2015 Textmining
04.02.2015 Gastvortrag Meta-Topics
11.02.2015 Challenge, Klausur Q&A
2 Institut AIFB
RECAP: Feature Vector – Data Representation
How to model Users in a social network?

Given: U ser(Gender, Occupation, Age)
Entities: Attributes: Datamatrix:
Gender
0 1
User Occupation
x1,1 ... x1,N
@ ... ... ... A
Age
xM,1 ... xM,N
3 Institut AIFB
RECAP: Feature Vector - Method
Decision Trees
Perceptron
  Linear Classifiers
Neural Networks
  Support Vector Machines
  Naive Bayes Classifier
AdaBoost
Note: After proper preprocessing or with modified learning

methods also standard classifiers are able to deal with more
complex data representations (see previous lecture)
4 Institut AIFB
RECAP: Feature Vector - Task
Social Network: Predict single attributes of a person
  Limits:
  Multi-task learning (related but different situations)
  Time-series data (stocks, text,...)
  Single output with many states (tags, movies,...)
  Multiple outputs (movie ratings, knows-relation,...)
  Relational domains (most real world problems)
5 Institut AIFB
RECAP: Single Relational Representation -
Data
How to model Users and what they like?

Given: U ser(Occupation) likes(U ser, M ovie)
User Occupation
likes
0 1
x1,1 ... x1,N 2 x1,N 2 +1 ... x1,N 2 +M
Movie
@ ... ... ... ... ... ... A
xN 1 ,1 ... xN 1 ,N 2 xN 1 ,N 2 +1 ... xN 1 ,N 2 +M
6 Institut AIFB
Methods
hierarchical Bayes
  multi-label prediction
mixed models
hierarchical linear models
collaborative filtering
canonical correlation analysis
  multivariate regression
structured output prediction
principal component analysis
matrix factorization
7 Institut AIFB
Task
  Link prediction (web, citations)

Recommender systems
  Bio-informatics (protein interactions, protein structure
prediction)
  Computer Vision (image restoration)
  Networks (sensor networks, power grids)
  Textmining (Named entity recognition, word co-occurence)
8 Institut AIFB
RECAP: Multi-Relational Representation - Data
How to model Users and what they like?

Given: U ser(Occupation); likes(U ser, M ovie); M ovie(Genre)

0 1
x1,1 ... x1,N 1
@ ... ... ... A
User Occupation
0 xM1 1 ,1 ... xM 1 ,N 1
likes
x1,1 ... x1,N 2
@ ... ... ... A
Movie Genre xN 1 ,1 ... xN 1 ,N 2
0 1
x1,1 ... x1,N 2
@ ... ... ... A
xM 2 ,1 ... xM 2 ,N 2
9 Institut AIFB
RECAP: Multi-Relational Representation -
Algorithm
  Multi-relational Matrix/Tensor Decomposition
Probabilistic Graphical Models

Directed RGM: Based on Bayesian Networks (e.g. Infinite Hidden
Relational Models)
Undirected RGM: Based on Markov Networks (e.g. Markov Logic
Networks)
10 Institut AIFB
Multi-Relational Representation - Application
  Relation prediction and

  Instance clustering
for
  Graph structures (web, networks)
  Relational data (bases)
Complex Knowledge Representations (Ontologies)
Example: Social Networks

  Who knows who?
  Who lives where?
Which persons are similar?
11 Institut AIFB
Chapter 5 - 3
Probabilistic Graphical Models
12 Institut AIFB
The Beauty of PGMs
  Most fundamental KD approach
Theoretically well founded
Based on probability theory and graph theory
  Clean framework with a clear decoupling of representation,

inference and learning
  Recommended Reading: http://pgm.stanford.edu, https://

www.coursera.org/course/pgm
13 Institut AIFB
The 3 most essential dimensions
defining a KD problem:
  Data representation
Method (learning algorithm)
  Task (application)
Example:
Data Representation Method Task
Feature vector + Label Perceptron Classification
Graph Matrix Factorization Recommendation
Feature vector K-Means Clustering
14 Institut AIFB
Graphical Models: 3 Dimensions
Representation
aka „classifier representation“, „ML model“
  An abstract representation formalism to encode a model about the
world
  Learning
aka „training“
Building an concrete world model, by learning from real-world
observations
Calculate a „joint probability distribution“
Inference
aka „prediction“, „task“
Machinery to answer questions, given one specific situation in the
concrete world model
Calculate the „posterior distribution“
15 Institut AIFB
Example: Features of persons
Representation
  Network of random variables
One node (random variable) for gender, one for age, one for
occupation
Edges according to dependencies between nodes
  Learning
Estimate the parameters of the probability distribution of each
random variable
Observe many real-world persons and their gender, age and
occupation
  „Count“ the probability of their gender, age and occupation
Inference
Given a concrete person‘s age and occupation calculate the
probability of it‘s gender
16 Institut AIFB
From simple to complex probabilistic models
  Feature vector representation

Gender
User Occupation
Age
  Simple (directed) graphical model:
Gender
P (gender|age, occupation)
Age Occupation
17 Institut AIFB
Graphical Models – (In)Dependence
  A marriage between graph theory and probability theory

  Directed probabilistic graphical models are graphs in which
nodes represent random variables.
  Arcs, or the lack of arcs, represent conditional
independence assumptions.
  Compact representation of joint probability distributions.
  Independence:
P (x, y) = P (x)P (y) Gender Occupation
  Dependence:
P (x, y) = P (x|y)P (y) Gender Occupation
18 Institut AIFB
Graphical Models - Observations
  When we observe a variable – know it’s value from data –

we color the variable corresponding to that node grey.
  Observing a variable allows us to condition on it.
E.g. p(y|x1,x2)
  Given an observation of any variable we can generate pdfs
for the other variables.
Gender y
Age Occupation x1 x2
19 Institut AIFB
Model parameters as nodes
  If we model the parameters θ, as a random variable, we

can include them in the graphical model.
x1 x2
20 Institut AIFB
Background: Bayes Rule
  Likelihood x Prior / marginal likelihood or evidence =

posterior
  Goal of Bayesian inference: Given alpha and OBS =>
estimate theta. Alpha is a hyperparameter of theta
  E.g. done with maximum-likelihood estimation
21 Institut AIFB
Complex Dependencies: Bayesian Networks
  A model to express complex joint probabilities by

decomposition into a product of CPDs.
22 Institut AIFB
Problem
If the number of features (parents) becomes large

probability tables become infeasible
Simplyfying assumption: all features are independent
Calculation of the posterior reduces to
23 Institut AIFB
Naïve Bayes Classifier
  Assumption: Observation variables, x_i are each

independent given the class y.
  A distribution is optimized using maximum likelihood for
each variable separately.
  “A Graphical Model for feature vector representations”.
x1 x2 ...
24 Institut AIFB
From Bayesian Nets to Relational PGMs
  Now we know how to model a simple network of variables.
  How can we use this to model a complex domain with

entity classes and relations?
  We will call them “Relational Probabilistic Graphical

Models“ (in the book: “Probabilistic Models for Object-Relational Domains”)
25 Institut AIFB
Relational Schema and its Instantiation
26 Institut AIFB
Relational Skeleton and Probabilistic
Dependencies
27 Institut AIFB
CPD and dependency graph
28 Institut AIFB
Probabilistic modeling of an instance graph
29 Institut AIFB
Plate representation
30 Institut AIFB
Making use of the Schema
(this is just a figure not a plate representation)
31 Institut AIFB
Introducing Latent Classes (now this is a relational model)
32 Institut AIFB
Dirichlet Distribution
Samples from the Dir-distribution lie in the m-1 dimensional

probability simplex
Dirichlet Process
33 Institut AIFB
Non-parametric latent PGM (this one is called „IHRM“)
http://www.aifb.kit.edu/images/e/eb/Xu_socialnetmining_SNMwNRM.pdf
34 Institut AIFB
Parameters of the IHRM
35 Institut AIFB
Parameter Estimation:
Expectation Maximization (cmp. K-means)
  First, initialize the parameters ✓ to some random values.

  Compute the best value for Z given these parameter
values by maximizing the log-likelihood over all possible
values of Z .
  Then, use the just-computed values of Z to compute a
better estimate for the parameters ✓ . Parameters
associated with a particular value of Z will use only those
data points whose associated latent variable has that
value.
  Iterate steps 2 and 3 until convergence.
36 Institut AIFB
Parameter Estimation for IHRM
  Compute the joint posterior distribution

  using blocked Gibbs sampling
  At each iteration, first update the hidden variables and then
update the parameters.
37 Institut AIFB
Features of IHRM
  Goal is to group entities into clusters (multi-relational

clustering).
  This allows to predict missing relationships in-between
clusters.
  IHSM discovers number of clusters and cluster
assignments.
  Cluster assessment is influenced by relations, attributes
and cluster assignments
  Cross-attribute and cross-entity dependencies are learned
(collaborative filtering)
  Easy to apply without any extensive structural learning
38 Institut AIFB
IHRM as an example of a Relational PGM
  IHRM is one example of a Multi-Relational ML algorithm

It is a Directed Relational Graphical Model based on
Bayesian Networks.
Like all relational graphical models it builds a probabilistic
model of the complete relational domain.
Unknown parameters of the probability distributions of the
model can be estimated from data.
Given such a model and parameters, unknown relation-
and entity-instances can be inferred.
39 Institut AIFB
dels, such as the Hidden Markov Models, is outlined.
below might aid in (non-relational)
Famous understanding the relationship between
Probabilistic hidden Markov
Graphical
l Models Models
and Bayesian networks.
  GM: Graphical Model

GM
  UGM: Undirected GM
  DGM: Directed GM
UGM DGM
  BN: Bayesian Net
  DBN: Dynamic BN
BN   HMM: Hidden Markov
Model
DBN
  KF: Kalman Filter
  NN: Neural Network
HMM KF NN
the relationships between the various models in this essay

40 Institut AIFB
iations have the following meaning: GM Graphical Model, UGM Undirected

Review: Statistical Relational Learning
Data representations
•  Graphs, Matrices, Entity-Relationship Models, RDF
Learning algorithms
•  Hierarchy of suitable algorithms ranging from simple feature-vector
based to multi-relational / logical representations
Applications
•  Social, biological, computer networks. Domains with complex
dependencies between heterogenous variables which violate the
i.i.d. assumption.
41 Institut AIFB
Knowledge Discovery Lecture WS14/15
22.10.2014 Einführung
Basics, Overview
29.10.2014 Design of KD-experiments
05.11.2014 Linear Classifiers
12.11.2014 Data Warehousing & OLAP
19.11.2014 Non-Linear Classifiers (ANNs) Supervised Techniques,
26.11.2014 Kernels, SVM Vector+Label Representation
03.12.2014 entfällt
10.12.2014 Decision Trees
17.12.2014 IBL & Clustering Unsupervised Techniques
07.01.2015 Relational Learning I
Semi-supervised Techniques,
14.01.2015 Relational Learning II
Relational Representation
21.01.2015 Relational Learning III
28.01.2015 Textmining
04.02.2015 Gastvortrag Meta-Topics
11.02.2015 Challenge, Klausur Q&A
42 Institut AIFB

KD V 3 Relational Learning 1415

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

KD V 3 Relational Learning 1415

Uploaded by

Copyright:

Available Formats

Knowledge Discovery WS 14/15

Relational Learning III 11

INSTITUT FÜR ANGEWANDTE INFORMATIK UND FORMALE BESCHREIBUNGSVERFAHREN (AIFB)

KIT – University of the State of Baden-Württemberg and

How to model Users in a social network?

Entities: Attributes: Datamatrix:

Note: After proper preprocessing or with modified learning

Social Network: Predict single attributes of a person

How to model Users and what they like?

Entities: Attributes: Datamatrix:

Link prediction (web, citations)

How to model Users and what they like?

Entities: Attributes: Datamatrix:

Multi-relational Matrix/Tensor Decomposition

Probabilistic Graphical Models

Relation prediction and

Relational data (bases)

Complex Knowledge Representations (Ontologies)

Example: Social Networks

Who lives where?

Which persons are similar?

Probabilistic Graphical Models

Most fundamental KD approach

Theoretically well founded

Based on probability theory and graph theory

Clean framework with a clear decoupling of representation,

Recommended Reading: http://pgm.stanford.edu, https://

Feature vector representation

Simple (directed) graphical model:

A marriage between graph theory and probability theory

When we observe a variable – know it’s value from data –

If we model the parameters θ, as a random variable, we

Likelihood x Prior / marginal likelihood or evidence =

A model to express complex joint probabilities by

If the number of features (parents) becomes large

Simplyfying assumption: all features are independent

Calculation of the posterior reduces to

Assumption: Observation variables, x_i are each

Now we know how to model a simple network of variables.

How can we use this to model a complex domain with

We will call them “Relational Probabilistic Graphical

Samples from the Dir-distribution lie in the m-1 dimensional

First, initialize the parameters ✓ to some random values.

Compute the joint posterior distribution

Goal is to group entities into clusters (multi-relational

IHRM is one example of a Multi-Relational ML algorithm

GM: Graphical Model

the relationships between the various models in this essay

iations have the following meaning: GM Graphical Model, UGM Undirected

You might also like

  Link prediction (web, citations)

  Multi-relational Matrix/Tensor Decomposition

  Relation prediction and

  Relational data (bases)

  Who lives where?

  Most fundamental KD approach

  Clean framework with a clear decoupling of representation,

  Recommended Reading: http://pgm.stanford.edu, https://

  Feature vector representation

  Simple (directed) graphical model:

  A marriage between graph theory and probability theory

  When we observe a variable – know it’s value from data –

  If we model the parameters θ, as a random variable, we

  Likelihood x Prior / marginal likelihood or evidence =

  A model to express complex joint probabilities by

  Assumption: Observation variables, x_i are each

  Now we know how to model a simple network of variables.

  How can we use this to model a complex domain with

  We will call them “Relational Probabilistic Graphical

  First, initialize the parameters ✓ to some random values.

  Compute the joint posterior distribution

  Goal is to group entities into clusters (multi-relational

  IHRM is one example of a Multi-Relational ML algorithm

  GM: Graphical Model