You are on page 1of 64

Machine Learning

Techniques for the


Semantic Web
Paul Dix
http://pauldix.net
paul@pauldix.net
Machine Learning
Semantic Web
What is Semantic Web?
Ontology
RDF
Machine Learning is
about Data
actually...
Making Predictions
Based on Data
FOAF
Simple Example
Marco Neumann
<http://www.marconeumann.org/foaf.rdf>
<http://xmlns.com/foaf/0.1/knows>
<http://community.linkeddata.org/dataspace/person/
kidehen2/about.rdf> .
<http://www.marconeumann.org/foaf.rdf>
<http://xmlns.com/foaf/0.1/knows>
<http://www.johnbreslin.com/foaf/foaf.rdf> .
<http://www.marconeumann.org/foaf.rdf>
<http://xmlns.com/foaf/0.1/knows>
<http://swordfish.rdfweb.org/people/libby/rdfweb/
webwho.xrdf> .
<http://www.marconeumann.org/foaf.rdf>
<http://xmlns.com/foaf/0.1/knows>
<http://danbri.org/foaf.rdf> .
Marco only knows 4
people?
Two Degrees Out
4 - <http://www.w3.org/People/Connolly/home-smart.rdf>
4 - <http://jibbering.com/foaf.rdf>
2 - <http://sw.deri.org/~haller/foaf.rdf>
2 - <http://sw.deri.org/~knud/knudfoaf.rdf>
2 - <http://www-cdr.stanford.edu/~petrie/foaf.rdf>
Three Degrees
9 - <http://sw.deri.org/~knud/knudfoaf.rdf>
8 - <http://www.w3.org/People/Connolly/home-smart.rdf>
7 - <http://jibbering.com/foaf.rdf>
6 - <http://www.aaronsw.com/about.xrdf>
5 - <http://sw.deri.org/~aharth/foaf.rdf>
but that’s not really
machine learning
Short
Machine Learning is

• How you formulate the problem


• How you represent the data
• Graphical Models
• Vector Space Models
Back to FOAF
Convert RDF triples to vector space
We Want to Find
Groups of People
To make predictions on
their interests...
(subject) (predicate) (object)
Paul knows Jeff
Paul knows Joe
Paul knows Marco
Jeff knows Joe
Vector Space
Representation
Jeff Joe Marco Paul

Jeff 1 1

Joe 1 1

Marco 1

Paul 1 1 1
Latent Factors Analysis

• Used in Latent Semantic Indexing (LSI)


• Good for finding synonyms
• Good for finding “genres”
Latent Factors Methods

• Principle Component Analysis (PCA)


• Singular Value Decomposition (SVD)
• Restricted Boltzmann Machines (RBM)
Considerations for
Semantic Web Data

• Large Data Sets


• Sparse Data Sets
Netflix Prize Research

• Movie Review Data set has similar


problems
• Generalized Hebbian Algorithm for
Dimensionality Reduction in NLP (Gorrell
’06.)
Reduce Dimensions

• 1m x 1m matrix with 1m people


• Reduce to 1m x 100
100 Latent Factors
Represent different groups of people based on who
they know.
What the Data Might
Look Like
Factor 1 Factor 2

Paul 0.678 0.311

Joe 0.455 0.432

Jeff 0.476 0.398

Marco 0.203 0.789


Find Similar People
k Nearest Neighbors
Pick a Similarity Metric

• Euclidean Distance
• Jaccard index
• Cosine Similarity
Joe’s Similarity to Paul
(Paul (f1) - Joe (f1))^2 + (Paul (f2) - Joe (f2))^2)^1/2
Once We’ve Calculated
Similarities
• Fill In Missing Interests
• Target Ads, Content, Products
• ???
• Profit!
Generalizing RDF
Triples to Vector Space
• Subjects are Rows
• Objects are Columns
• Predicates are values
Object 1 Object 2

Subject 1 Predicate

Subject 2
Predicates Should be
Mutually Exclusive

• Paul likes Ruby


• Paul hates PHP
• Paul loves PHP
Assign Values to
Predicates
• 1 = Hates
• 2 = Dislikes
• 3 = Neutral
• 4 = Likes
• 5 = Loves
More Applications
Supervised Learning

• Classifiers
• Ontology Mapping
• Assigning Instances to Concepts
Ontology Mapping

• Examples from Ontology A


• Examples from Ontology B
Train Classifiers

• One Classifier for each Concept in A


• One Classifier for each Concept in B
Classify Instances

• Use A Classifiers to predict which concepts


B instances map to
• Use B Classifiers to predict which concepts
A instances map to
Use Classified Instances

• Predict Concept Mappings


• Which in A match ones in B
Limitations

• One Classifier per Concept


• Large Ontologies Could be a Problem
• Ontologies should be a little similar
Unsupervised Learning

• Clustering
• Hierarchical Clustering
• Learning Ontologies from Text
Machine Learning as
Triage

• Automatically tag or recommend Examples


the algorithm is Certain About
• Send uncertain examples to human for
review
Thank You
Paul Dix
paul@pauldix.net
http://pauldix.net