You are on page 1of 49

Better Together – An Ensemble Learner for Combining the Results of Ready-made Entity

Linking Systems

Renato Stoffalette João1, Pavlos Fafalios2, Stefan Dietze1,3


joao@L3S.de

1
L3S Research Center / Leibniz University of Hanover Hannover, Germany
Institute of Computer Science, FORTH-ICS
2

Heraklion, Greece
GESIS – Leibniz Institute for the Social Sciences
3

Köln, Germany
Introduction


Entity linking (EL) : The task of determining the identity of entity mentions in
text and linking them to entries in a knowledge base (KB) (e.g. Wikipedia)

Jordan played for the Wizards


Introduction


Entity linking (EL) : The task of determining the identity of entity mentions in
text and linking them to entries in a knowledge base (KB) (e.g. Wikipedia)

Jordan played for the Wizards


Introduction


Entity linking (EL) : The task of determining the identity of entity mentions in
text and linking them to entries in a knowledge base (KB) (e.g. Wikipedia)

Jordan played for the Wizards

Michael Jordan Jordan (country) Wilhem Jordan Washington Wizards Wizards (film) Wizards (fantasy)
Introduction


Entity linking (EL) : The task of determining the identity of entity mentions in
text and linking them to entries in a knowledge base (KB) (e.g. Wikipedia)

Jordan played for the Wizards

Michael Jordan Jordan (country) Wilhem Jordan Washington Wizards Wizards (film) Wizards (fantasy)
Introduction


Entity linking is a task of relevance for a wide variety of applications
Introduction


Entity linking is a task of relevance for a wide variety of applications


Information retrieval

Document classification

Topic modeling
Introduction


Entity linking is a task of relevance for a wide variety of applications


Information retrieval

Document classification

Topic modeling


High Precision and Recall are required to have positive impact
Introduction


EL systems differ along multiple dimensions and are evaluated over different
data sets


GERBIL framework is used to compare EL systems over a large number of
data sets. Their performance is affected by the characteristics of the data sets
(i.e. number of entities per document, the average document length, or the salient entity types)
Motivation


Selecting an EL system and configuration on a per mention-basis rather than
for a particular corpus, can significantly increase the EL performance


Typical features include ambiguity, mentions of long-tail entities, or mentions
recognised in short documents with very limited context information


Effective features can be derived from the corpus, mention or surface form to
be linked in order to predict the best-performing EL system on a per-mention-
basis using supervised models

Sergio .?
talked to .
Ennio .?
about . Eli (Bible)
Eli‘s role . Eli Wallach
in the
Ecstasy . Ecstasy (drug)
In a Nutshell


Meta Entity Linking, (MetaEL) - outputs of multiple EL systems are combined


Diverse set of features provides suitable signals for predicting the best answer
( EL system ) for a given entity mention


Supervised classifiers which exploit the aforementioned feature set for
predicting correct entity links
Approach

EL1
A1
corpus D EL2 2
El MetaEL Au
A2
EL
Elnn
An
annotations Sets

annotation ϵ Ai = < doc, surface_form, position , entity >


Approach

MetaEL+ proposed in two variations


Focusing on high recall

MetaEL+ LOOSE


Focusing on high precision

MetaEL+ STRICT
Approach

MetaEL+ proposed in two variations



predict( ) the system using a multi-label
classification model C

Focusing on high recall

Only one EL system has recognised/disambiguated the
MetaEL+ LOOSE entity mention m

MetaEL+ LOOSE


Assign to m the entity provided by this EL system

Focusing on high precision
MetaEL+ STRICT
MetaEL+ STRICT

Predict whether the link is correct using a system-
specific binary classifier
Supervised Classification
Summary of the considered features
Classifiers

Multi-label Classification

Problem transformation methods. Transform the
multi-label classification problem into single-label
classification

Binary Relevance

Label Powerset

Classifier Chains


Algorithm adaptation methods. Extend specific
learning algorithms handle multi-label data directly
Classifiers

Multi-label Classification

Problem transformation methods. Transform the
multi-label classification problem into single-label
classification

Binary Relevance + Random Forest

Label Powerset

Classifier Chains


Algorithm adaptation methods. Extend specific
learning algorithms handle multi-label data directly
Classifiers

Multi-label Classification Binary Classification



Problem transformation methods. Transform the
multi-label classification problem into single-label

SVM
classification

Binary Relevance + Random Forest 
NaiveBayes

Label Powerset

Classifier Chains 
Logistic Regression


Algorithm adaptation methods. Extend specific 
Decision Tree (J48)
learning algorithms handle multi-label data directly

Random Forest


Sequential Minimal Optimisation (SMO)


Multilayer Perceptron
Classifiers

Multi-label Classification Binary Classification



Problem transformation methods. Transform the
multi-label classification problem into single-label

SVM
classification

Binary Relevance + Random Forest 
NaiveBayes

Label Powerset

Classifier Chains 
Logistic Regression


Algorithm adaptation methods. Extend specific 
Decision Tree (J48)
learning algorithms handle multi-label data directly

Random Forest


Sequential Minimal Optimisation (SMO)


Multilayer Perceptron
Training and Labelling


Manual labelling (e.g. from domain experts)


Crowd-sourcing


Existing ground truth datasets
Training and Labelling


Manual labelling (e.g. from domain experts)


Crowd-sourcing


Existing ground truth datasets
Training and Labelling


Manual labelling (e.g. from domain experts)


Crowd-sourcing


Existing ground truth datasets

Multi-label classification

Label the training instances by simply considering the systems that managed to correctly
disambiguate the mention

Binary classification

Use only the annotations produced by the corresponding EL tool and label the training
instance as either true or false
v
Evaluation

Ground Truth Datasets


CONLL: [Hoffart et al., 2011]
Datasets Statistics

IITB:[Kulkarni et al., 2009]


NEEL: [Cano et al., 2016]
v
Evaluation

2016

Performance of the used EL systems on the CONLL-TestB ground truth


Baselines


Each EL system, i.e., Ambiverse, Babelfy, and TagMe


Random: Randomly selects one of the systems


Best System: Selects the link of the system with the highest overall performance in the used ground truth
data


Majority+Random: Selects the link provided by majority. If EL systems provide different links select
random
Baselines


Majority+Best: Selects the link provided by majority. If EL systems provide different links, select highest
overall performance


Weighted Voting: Combination through weighted voting scheme [Ruiz and Poibeau, 2015] filtering out
annotations with score lower than the maximum precision


Weighted Voting All: Combination through the weighted voting scheme [Ruiz and Poibeau, 2015]
MetaEL+ Methods


MetaEL+ LOOSE: Multi-label Binary Relevance classifier with Random forest as the base
classifier


MetaEL+ STRICT: Multi-label Binary Relevance classifier with Random forest as the base
classifier with the addition of a binary classifier for predicting links for mentions recognized
by only one EL system
Evaluation Metrics

Single-label Classification
Entity Linking 
Accuracy


Per-class Precision,Recall and F1 score

Precision

Multi-label Classification

Recall 
Hamming Loss - Fraction of the wrong labels to the total number
of labels

F1 score

Jaccard Index - Number of correctly predicted labels divided by
the union of predicted and true labels


Exact Match - Percentage of samples that have all their labels
classified correctly


Per-Class Precision, Recall and F1 score
5.1.1 Annotation and Agreement Statistics.

Annotation and Agreement Statistics


5.1.1 Annotation and Agreement Statistics.

Annotation and Agreement Statistics


5.1.1 Annotation and Agreement Statistics.

Annotation and Agreement Statistics


5.1.1 Annotation and Agreement Statistics.

Annotation and Agreement Statistics


5.1.1 Annotation and Agreement Statistics.

Annotation and Agreement Statistics


5.2 Entity Linking Performance

Entity Linking
Entity linking performance Performance
5.2 Entity Linking Performance

Entity Linking
Entity linking performance Performance
5.2 Entity Linking Performance

Entity Linking
Entity linking performance Performance
5.2 Entity Linking Performance

Entity Linking
Entity linking performance Performance
5.2 Entity Linking Performance

Entity Linking
Entity linking performance Performance
5.2 Entity Linking Performance

Entity Linking
Entity linking performance Performance
Prediction Performance
Binary classification

Multi-label classification
Feature Analysis

Effectiveness of different combinations of features (using MetaEL+ LOOSE on CONLL dataset)


Feature Analysis

Effectiveness of different combinations of features (using MetaEL+ LOOSE on CONLL dataset)


Feature Analysis

Effectiveness of different combinations of features (using MetaEL+ LOOSE on CONLL dataset)


Feature Analysis

Effectiveness of different combinations of features (using MetaEL+ LOOSE on CONLL dataset)


Synopsis and Limitations


Our MetaEL+ approach that considers a diverse set of features outperforms the
individual systems and six baselines


A STRICT MetaEL method can further improve precision without significantly
affecting recall


The proposed multilabel classifier achieves a prediction accuracy of > 90%


All three categories of features contribute to achieve the highest performance
Synopsis and Limitations


The proposed binary classifiers achieve a relatively low accuracy showing room for
improvements of the STRICT MetaEL method


Need of corpus-specific training data


Design corpus-independent features and use more diverse EL systems
Conclusions


EL performance may be optimised by combining the results of distinct EL systems


Novel approach Meta Entity Linking (MetaEL)


Outputs of multiple EL systems are unified through an ensemble learning approach


Model the problem as a supervised classification task and provide rich set of features


Our multi-label classifier outperforms the F1 score of both the best performing individual EL system (around
10 percentage points) and the best baseline methods (around 5 percentage points) on the CONLL data set


Almost all the proposed features contribute on achieving a high EL performance
Future Work


Evaluate performance gain of MetaEL using different number and combinations of EL tools, including more
recent models that use neural networks


Study distantly supervised approaches where weakly labelled data are generated based on heuristics


Investigate the applicability of more advanced models for binary classification task
Thank you

Questions

You might also like