You are on page 1of 2

5.

A Low-cost, High-coverage Legal Named Entity Recognizer, Classifier and Linker

A Low-cost, High-coverage Legal Named


Entity Recognizer, Classifier and Linker
Yu Chen, 03722670

Concepts
ECHR: European Court of Human Rights

Summary
(What) A legal Named Entity Recognizer, Classifier and Linker is developed to identify
relevant parts of legal texts and connect them to a structured knowledge representation,
the LKIF ontology.
(How) The Named Entity Recognizer, Classifier and Linker is trained on the mentions of
entities in the Wikipedia (manually annotated examples) and is able to map the LKIF
ontology to the YAGO ontology and through it.
(Performance) The proposed approach achieves an around 80% F-measure for different
levels of granularity on two testing texts (one from wikipedia and another from a sample of
legal judgments), so this approach has potentiality to be applied to other legal sub-domains,
represented by different ontologies.

Three positive aspects


Less effort is needed to build the training texts since the author utilizes the mentions of
entities in Wikepedia as the manually annotated training samples.
The author speculates that the very distinct conceptualization due to the bigger classes
(populated with more mentions in Wikipedia text) is the main error source of classification.
The developed tools and resources are open-source and freely available to anyone and can
be reproduced for any legal subdomain of interest.

Three criticisms
(Data preprocessing) The author did not pre-process the training texts to balance the
classes for learners.
(Testing set not representative) The author's approach tested on holdout texts from the
Wikipedia and a small sample of judgments of the ECHR only is not representative enough to
ensure this approach can be ported to other legal sub-domains (A more representative
testing dataset is required to evaluate the performance).
(Endless loop) The author did not use strictly manually annotated texts as the training set to
train the Entity Recognizer, Classifier and Linker, and then gained an model to be able to pre-
annotate the legal domain articles of Wikipedia (trained on less strictness and used as less
strict application, pre-annotation).

Three questions to the author


Is the specification of the mentions of entities in Wikipedia guaranteed to replace a strictly
manually annotated examples?
Can your approach be treated as a standard tool for legal experts to create a strictly
manually annotated training sets?
How do you ensure that a class balance processing can improve the performance of your
approach?

You might also like