You are on page 1of 42

Knowledge Discovery WS 14/15

Relational Learning III 11  


Prof. Dr. Rudi Studer, Dr. Achim Rettinger*, Dipl.-Inform. Lei Zhang, M.Sc Aditya Mogadala, M. Sc.
Steffen Thoma
{rudi.studer, achim.rettinger, l.zhang, aditya.mogadala, steffen.thoma}@kit.edu

INSTITUT FÜR ANGEWANDTE INFORMATIK UND FORMALE BESCHREIBUNGSVERFAHREN (AIFB)

KIT – University of the State of Baden-Württemberg and


National Laboratory of the Helmholtz Association www.kit.edu
Knowledge Discovery Lecture WS14/15
22.10.2014 Einführung
Basics, Overview
29.10.2014 Design of KD-experiments
05.11.2014 Linear Classifiers
12.11.2014 Data Warehousing & OLAP
19.11.2014 Non-Linear Classifiers (ANNs) Supervised Techniques,
26.11.2014 Kernels, SVM Vector+Label Representation
03.12.2014 entfällt
10.12.2014 Decision Trees
17.12.2014 IBL & Clustering Unsupervised Techniques
07.01.2015 Relational Learning I
Semi-supervised Techniques,
14.01.2015 Relational Learning II
Relational Representation
21.01.2015 Relational Learning III
28.01.2015 Textmining
04.02.2015 Gastvortrag Meta-Topics
11.02.2015 Challenge, Klausur Q&A

2 Institut AIFB
RECAP: Feature Vector – Data Representation

How to model Users in a social network?


Given: U ser(Gender, Occupation, Age)

Entities: Attributes: Datamatrix:

Gender
0 1
User Occupation
x1,1 ... x1,N
@ ... ... ... A
Age
xM,1 ... xM,N

3 Institut AIFB
RECAP: Feature Vector - Method

Decision Trees
Perceptron
  Linear Classifiers
Neural Networks
  Support Vector Machines
  Naive Bayes Classifier
AdaBoost

Note: After proper preprocessing or with modified learning


methods also standard classifiers are able to deal with more
complex data representations (see previous lecture)

4 Institut AIFB
RECAP: Feature Vector - Task

Social Network: Predict single attributes of a person

  Limits:
  Multi-task learning (related but different situations)
  Time-series data (stocks, text,...)
  Single output with many states (tags, movies,...)
  Multiple outputs (movie ratings, knows-relation,...)
  Relational domains (most real world problems)

5 Institut AIFB
RECAP: Single Relational Representation -
Data

How to model Users and what they like?


Given: U ser(Occupation) likes(U ser, M ovie)

Entities: Attributes: Datamatrix:

User Occupation

likes
0 1
x1,1 ... x1,N 2 x1,N 2 +1 ... x1,N 2 +M
Movie
@ ... ... ... ... ... ... A
xN 1 ,1 ... xN 1 ,N 2 xN 1 ,N 2 +1 ... xN 1 ,N 2 +M

6 Institut AIFB
RECAP: Single Relational Representation -
Methods

hierarchical Bayes
  multi-label prediction
mixed models
hierarchical linear models
collaborative filtering
canonical correlation analysis
  multivariate regression
structured output prediction
principal component analysis
matrix factorization

7 Institut AIFB
RECAP: Single Relational Representation -
Task

  Link prediction (web, citations)


Recommender systems
  Bio-informatics (protein interactions, protein structure
prediction)
  Computer Vision (image restoration)
  Networks (sensor networks, power grids)
  Textmining (Named entity recognition, word co-occurence)

8 Institut AIFB
RECAP: Multi-Relational Representation - Data

How to model Users and what they like?


Given: U ser(Occupation); likes(U ser, M ovie); M ovie(Genre)

Entities: Attributes: Datamatrix:


0 1
x1,1 ... x1,N 1
@ ... ... ... A
User Occupation
0 xM1 1 ,1 ... xM 1 ,N 1
likes
x1,1 ... x1,N 2
@ ... ... ... A
Movie Genre xN 1 ,1 ... xN 1 ,N 2
0 1
x1,1 ... x1,N 2
@ ... ... ... A
xM 2 ,1 ... xM 2 ,N 2
9 Institut AIFB
RECAP: Multi-Relational Representation -
Algorithm

  Multi-relational Matrix/Tensor Decomposition

Probabilistic Graphical Models


Directed RGM: Based on Bayesian Networks (e.g. Infinite Hidden
Relational Models)
Undirected RGM: Based on Markov Networks (e.g. Markov Logic
Networks)

10 Institut AIFB
Multi-Relational Representation - Application

  Relation prediction and


  Instance clustering

for
  Graph structures (web, networks)

  Relational data (bases)

Complex Knowledge Representations (Ontologies)

Example: Social Networks


  Who knows who?

  Who lives where?

Which persons are similar?

11 Institut AIFB
Chapter 5 - 3

Probabilistic Graphical Models

12 Institut AIFB
The Beauty of PGMs

  Most fundamental KD approach

Theoretically well founded

Based on probability theory and graph theory

  Clean framework with a clear decoupling of representation,


inference and learning

  Recommended Reading: http://pgm.stanford.edu, https://


www.coursera.org/course/pgm

13 Institut AIFB
The 3 most essential dimensions
defining a KD problem:

  Data representation
Method (learning algorithm)
  Task (application)

Example:
Data Representation Method Task
Feature vector + Label Perceptron Classification
Graph Matrix Factorization Recommendation
Feature vector K-Means Clustering

14 Institut AIFB
Graphical Models: 3 Dimensions

Representation
aka „classifier representation“, „ML model“
  An abstract representation formalism to encode a model about the
world
  Learning
aka „training“
Building an concrete world model, by learning from real-world
observations
Calculate a „joint probability distribution“
Inference
aka „prediction“, „task“
Machinery to answer questions, given one specific situation in the
concrete world model
Calculate the „posterior distribution“
15 Institut AIFB
Example: Features of persons

Representation
  Network of random variables
One node (random variable) for gender, one for age, one for
occupation
Edges according to dependencies between nodes
  Learning
Estimate the parameters of the probability distribution of each
random variable
Observe many real-world persons and their gender, age and
occupation
  „Count“ the probability of their gender, age and occupation
Inference
Given a concrete person‘s age and occupation calculate the
probability of it‘s gender
16 Institut AIFB
From simple to complex probabilistic models

  Feature vector representation


Gender

User Occupation

Age

  Simple (directed) graphical model:

Gender

P (gender|age, occupation)
Age Occupation

17 Institut AIFB
Graphical Models – (In)Dependence

  A marriage between graph theory and probability theory


  Directed probabilistic graphical models are graphs in which
nodes represent random variables.
  Arcs, or the lack of arcs, represent conditional
independence assumptions.
  Compact representation of joint probability distributions.

  Independence:
P (x, y) = P (x)P (y) Gender Occupation

  Dependence:
P (x, y) = P (x|y)P (y) Gender Occupation

18 Institut AIFB
Graphical Models - Observations

  When we observe a variable – know it’s value from data –


we color the variable corresponding to that node grey.
  Observing a variable allows us to condition on it.
E.g. p(y|x1,x2)
  Given an observation of any variable we can generate pdfs
for the other variables.

Gender y

Age Occupation x1 x2

19 Institut AIFB
Model parameters as nodes

  If we model the parameters θ, as a random variable, we


can include them in the graphical model.

x1 x2

20 Institut AIFB
Background: Bayes Rule

  Likelihood x Prior / marginal likelihood or evidence =


posterior
  Goal of Bayesian inference: Given alpha and OBS =>
estimate theta. Alpha is a hyperparameter of theta
  E.g. done with maximum-likelihood estimation

21 Institut AIFB
Complex Dependencies: Bayesian Networks

  A model to express complex joint probabilities by


decomposition into a product of CPDs.

22 Institut AIFB
Problem

If the number of features (parents) becomes large


probability tables become infeasible

Simplyfying assumption: all features are independent

Calculation of the posterior reduces to

23 Institut AIFB
Naïve Bayes Classifier

  Assumption: Observation variables, x_i are each


independent given the class y.
  A distribution is optimized using maximum likelihood for
each variable separately.
  “A Graphical Model for feature vector representations”.

x1 x2 ...

24 Institut AIFB
From Bayesian Nets to Relational PGMs

  Now we know how to model a simple network of variables.

  How can we use this to model a complex domain with


entity classes and relations?

  We will call them “Relational Probabilistic Graphical


Models“ (in the book: “Probabilistic Models for Object-Relational Domains”)

25 Institut AIFB
Relational Schema and its Instantiation

26 Institut AIFB
Relational Skeleton and Probabilistic
Dependencies

27 Institut AIFB
CPD and dependency graph

28 Institut AIFB
Probabilistic modeling of an instance graph

29 Institut AIFB
Plate representation

30 Institut AIFB
Making use of the Schema
(this is just a figure not a plate representation)

31 Institut AIFB
Introducing Latent Classes (now this is a relational model)

32 Institut AIFB
Dirichlet Distribution

Samples from the Dir-distribution lie in the m-1 dimensional


probability simplex

Dirichlet Process

33 Institut AIFB
Non-parametric latent PGM (this one is called „IHRM“)

http://www.aifb.kit.edu/images/e/eb/Xu_socialnetmining_SNMwNRM.pdf

34 Institut AIFB
Parameters of the IHRM

35 Institut AIFB
Parameter Estimation:
Expectation Maximization (cmp. K-means)

  First, initialize the parameters ✓ to some random values.


  Compute the best value for Z given these parameter
values by maximizing the log-likelihood over all possible
values of Z .
  Then, use the just-computed values of Z to compute a
better estimate for the parameters ✓ . Parameters
associated with a particular value of Z will use only those
data points whose associated latent variable has that
value.
  Iterate steps 2 and 3 until convergence.

36 Institut AIFB
Parameter Estimation for IHRM

  Compute the joint posterior distribution


  using blocked Gibbs sampling
  At each iteration, first update the hidden variables and then
update the parameters.

37 Institut AIFB
Features of IHRM

  Goal is to group entities into clusters (multi-relational


clustering).
  This allows to predict missing relationships in-between
clusters.
  IHSM discovers number of clusters and cluster
assignments.
  Cluster assessment is influenced by relations, attributes
and cluster assignments
  Cross-attribute and cross-entity dependencies are learned
(collaborative filtering)
  Easy to apply without any extensive structural learning

38 Institut AIFB
IHRM as an example of a Relational PGM

  IHRM is one example of a Multi-Relational ML algorithm


It is a Directed Relational Graphical Model based on
Bayesian Networks.
Like all relational graphical models it builds a probabilistic
model of the complete relational domain.
Unknown parameters of the probability distributions of the
model can be estimated from data.
Given such a model and parameters, unknown relation-
and entity-instances can be inferred.

39 Institut AIFB
dels, such as the Hidden Markov Models, is outlined.
below might aid in (non-relational)
Famous understanding the relationship between
Probabilistic hidden Markov
Graphical
l Models Models
and Bayesian networks.

  GM: Graphical Model


GM
  UGM: Undirected GM
  DGM: Directed GM
UGM DGM
  BN: Bayesian Net
  DBN: Dynamic BN
BN   HMM: Hidden Markov
Model
DBN
  KF: Kalman Filter
  NN: Neural Network

HMM KF NN

the relationships between the various models in this essay


40 Institut AIFB

iations have the following meaning: GM Graphical Model, UGM Undirected


Review: Statistical Relational Learning

Data representations
•  Graphs, Matrices, Entity-Relationship Models, RDF

Learning algorithms
•  Hierarchy of suitable algorithms ranging from simple feature-vector
based to multi-relational / logical representations

Applications
•  Social, biological, computer networks. Domains with complex
dependencies between heterogenous variables which violate the
i.i.d. assumption.

41 Institut AIFB
Knowledge Discovery Lecture WS14/15
22.10.2014 Einführung
Basics, Overview
29.10.2014 Design of KD-experiments
05.11.2014 Linear Classifiers
12.11.2014 Data Warehousing & OLAP
19.11.2014 Non-Linear Classifiers (ANNs) Supervised Techniques,
26.11.2014 Kernels, SVM Vector+Label Representation
03.12.2014 entfällt
10.12.2014 Decision Trees
17.12.2014 IBL & Clustering Unsupervised Techniques
07.01.2015 Relational Learning I
Semi-supervised Techniques,
14.01.2015 Relational Learning II
Relational Representation
21.01.2015 Relational Learning III
28.01.2015 Textmining
04.02.2015 Gastvortrag Meta-Topics
11.02.2015 Challenge, Klausur Q&A

42 Institut AIFB

You might also like