You are on page 1of 81


Book of Abstracts

Silver sponsors

Bronze sponsors

Table of Contents
Silver sponsors................................................................................2
Bronze sponsors..............................................................................3
Oral Presentations............................................................8
Language Variation & Change......................................................8
Can we spot meaning shifts in diachronic representations?..........8
Dutch dialect use on Twitter...........................................................9
A probabilistic agent-based simulation for community level
language change in different geographical scenarios..................10
Testing the Processing Hypothesis of Dutch verb cluster order
variation using a probabilistic language model............................11
Parsing & Grammar Engineering................................................12
Parsing with Palm Trees...............................................................12
Minimalist Grammar Transition-Based Parsing..........................13
Evaluating computational construction grammars.......................14
Constraints without Complaints: Pictograph-to-Text Translation
with DELPH-IN and LOGON......................................................15
Entity & Term Extraction............................................................16
Medical Entity Recognition in Dutch forum data in the absence of
labeled training data.....................................................................16
Evaluation of named entity recognition in user-submitted police
Multi-word units in lexical processing and word learning: a
computational investigation.........................................................18
Multiword Expression Extraction through Independence and
Adaptation & Evaluation............................................................20
How well can the crowd perceive the multimodality and make
relevance judgments in case of video-to-video search?...............20
Syntactic Simplification for Improved Text-to-Pictograph
Towards Domain Adaptation for Dutch Social Media Text Trough
The CLIN27 Shared Task: Translating Historical Text................23

A Universal Dependencies Treebank for Dutch...........................24
Learning Morpho-Syntactic Attributes Representation for Cross-
Lingual Dependency Parsing.......................................................25
Annotating parses through subtree mappings..............................26
Building a Parallel Meaning Bank for English, Dutch, German
and Italian.....................................................................................27
Deep Learning............................................................................28
Computing abstract meaning representations using deep learning
Learning Representations of Phrases with Siamese LSTM
Generative Adversarial Networks for Dialogue Generation........30
Seq2Seq learning for Dialogue Representation...........................31
Language Acquisition & Psycholinguistics.................................32
Introducing NT2Lex: A Machine-readable CEFR-graded Lexical
Resource for Dutch as a Foreign Language.................................32
Useful contexts and easy words: effects of distributional factors
on lexical category acquisition.....................................................33
Modeling language interference effects in bilingual word reading
What do Neural Networks need in order to generalize?..............35
Distributional Semantics.............................................................36
Dutch Poetry Generation using Encoder-Decoder Networks......37
Improving Word2vec: Unshifting the shifted PMI matrix...........38
Unsupervised Word Sense Disambiguation with Sense Extended
Word Embeddings........................................................................39
Similarity Learning for Coreference Resolution..........................40
(To what Extent) can Neural Machine Translation deal with
Literary Texts?..............................................................................42
Mapping the PERFECT via Translation Mining..........................43
Translationese and Posteditese: Can users and computers detect
the difference?..............................................................................44
Open source tools and recognition models for using Kaldi on
Creating personalized text-to-speech voices using deep learning

Emergence of language structures from exposure to visually
grounded speech signal................................................................47
A Standard for NLP Pipelines......................................................48
PICCL: A Workflow for preparing corpora for exploration and
Querying Large Treebanks: Benchmarking GrETEL Indexing
Authorship & Topic Modelling...................................................51
Efficiency of Arabic Grammatical Representation in Authorship
Profiling Dutch authors on Twitter: discovering political
preference and income level.........................................................52
Bayesian Nonparametric Topic Models for Short Text Data.......53
Poster Presentations......................................................54
Natural proof system for natural language...................................55
Treebank querying with GrETEL 3: bigger, faster, stronger.......56
Domain Adaptation for LSTM Language Models.......................57
Towards more efficiently generated topic models. Preventing the
extraction of generic clusters through non-discriminative term
Universal Reordering via Linguistic Typology............................59
Court verdict: unreadable.............................................................60
Generalisation in Automatic Metaphor Detection.......................61
Testing language technology tools with persons with an
intellectual disability: a collaboration between researchers,
developers and target users...........................................................62
Twitter usage as a channel for scientific knowledge sharing.......63
Spelling correction for clinical text with word and character n-
gram embeddings..........................................................................64
Wikification for Implicit MT evaluation, First results for Dutch 65
Bilingual Lexicon Induction by Learning to Combine Word-Level
and Character-Level Representations...........................................66
Classifier optimization for cyberbully detection: finding the
needle in the haystack...................................................................67
Structured Learning for Temporal Relation Extraction from

Clinical Records...........................................................................68
Simulating language change in an artificial-intelligence
computational model....................................................................69
Improving interpolation factors in a Bayesian skipgram language
Annotation of Content Types in Historical Texts and
Contemporary News.....................................................................71
Clinical Case Reports dataset for machine reading.....................72
Estimating Post-editing Time Using Machine Translation Errors
Corpus Upload and Metadata Analysis Extensions for GrETEL 74
A data-to-text system for soccer reports.......................................75
Monday mornings are my fave :) #not Exploring the Automatic
Recognition of Irony in English Tweets.......................................76
Extracting Contrastive Linguistic Information from Statistical
Machine Translation Phrase-Tables.............................................77
Identifying Mood of Songs using Musical and Linguistic
Project STON: speeding up Dutch subtitling with speech and
language technology.....................................................................79
Visual Analytics for Parameter Tuning of Semantic Similarity
Comparative study of DNN-HMM and end-to-end DNN
architectures for ASR...................................................................81

Oral Presentations
Language Variation & Change

Can we spot meaning shifts in diachronic

Hessel Haagsma and Malvina Nissim University of Groningen
The meanings of words are known to change over time, and the rise of new
figurative uses is a fundamental driver behind this change. Detecting
meaning shifts is beneficial to any NLP task which relies on lexical
meaning (resources), as well as for research in lexical semantics. In
practice, the aim of this line of research is to be able to automatically detect
shifts in word meaning in a completely unsupervised fashion from raw text,
relying on distributional semantic representations.
Following the method described in Del Tredici et al. (2016), we train
distributional semantic models consecutively on yearly slices of two
diachronic newswire corpora of Dutch, containing over 800 million tokens.
By comparing the vector representations of a selection of possibly
interesting words across time, we observe the evolution of their meaning
from 1994 to 2016.
We assume that a substantial change in embeddings over time correlates
with a change in meaning, and that this is particularly true for the rise of
new figurative meanings. Evaluation is performed mainly by cross-
referencing dictionaries of Dutch from different years within the time span
covered by our corpus to identify the approximate time of insertion (and
thus the emergence) of new meanings.
Results are discussed with respect to a number of potentially confounding
factors such as data sparsity, frequency variation over time, and intrinsic
polysemy of the selected words. Both positive and negative results will be
presented, comparing them across the two datasets that we use, as well as
across languages.
Keywords: distributional semantics diachronic meaning change
metaphor lexical semantics

Dutch dialect use on Twitter
Hans Van Halteren and Roeland van Hout Radboud University
As a substantial part of the Twitter bandwidth is used for personal contacts,
in which the language use resembles spoken language, and words are often
spelled similar to spoken language, we should also expect to find dialectal
forms. Our impression in earlier work was that this is very much the case
for authors from Limburg, but much less so in the rest of the country. In this
paper, we want to explore to which degree dialect is being used, and how
this could be used in dialectological research.
Our starting point is the TwiNL collection, from which we take tweets
where the metadata contains a GPS-based location code. We then restrict
our data to those authors for whom 70% of the tweets can be placed in the
same town. With this data, we first examine whether there is a sufficient
amount of dialectal forms to be of use, focusing on Limburg: we compare
some traditionally accepted isoglosses, for words like ik/ich/iech (I), to
isoglosses derived from the Twitter data. We continue to a method for
automatically identifying dialectal forms, again benchmarking against
known lists of forms. In a final experiment, we will measure whether locally
more likely forms can be used to place authors in a specific regions, which
would allow us to expand our data beyond GPS users. After reporting on the
experiments, we will discuss what the results could mean for the study of
dialects in the Netherlands.
Keywords: dialect Twitter e-humanities

A probabilistic agent-based simulation for
community level language change in
different geographical scenarios
Hugo de Vos, Merijn Beeksma, Tom Claassen, Ans van Kemenade and
Ton Dijkstra Radboud University Nijmegen
We built an agent-based model (abm) to simulate historical language change
and present a case study on word order change in English. Our modeling
approach assumes that complex patterns in population-level language
change can be understood in terms of many small changes resulting from
interactions between individual agents of different populations. Each agent
has a language model that changes due to contact with other agents from the
same or the other population. As a result, micro-level changes (i.e. the level
of individual agents) lead to macro-level changes (i.e. the level of the
population). We implemented, manipulated and explored the effect of
learning rate, likelihood of interaction between agents from different
populations, location-bound dialects, and degree of agent variation within a
population. Although parts of the model leave room for fine-tuning and
external factors have yet to be combined into one model, the simulations
results show that abm is a useful tool for gaining more insight into historical
language change. The abm approach has potential for modeling word order
change in English, as well as language change in general.
Keywords: Agent Based Modelling historical linguistics language
change verb second

Testing the Processing Hypothesis of Dutch
verb cluster order variation using a
probabilistic language model
Jelke Bloem University of Amsterdam
We discuss the application of a measure of surprisal to the modeling of a
grammatical variation phenomenon between near-synonymous
constructions. We investigate a particular variation phenomenon, word
order variation in Dutch two-verb clusters, where it has been established
that word order choice is affected by processing cost, among other factors.
In Dutch subordinate clauses, the auxiliary verb can be positioned before or
after the past participle:
(1) Ik denk dat ik het [begrepen heb]
(2) Ik denk dat ik het [heb begrepen]
The choice between these orders is considered to be free, but speakers
appear to choose in systematic ways. Several multifactorial corpus studies
of Dutch verb clusters have shown that processing cost affects word order
choice. This previous work allows us to compare the surprisal measure,
which is based on constraint satisfaction theories of language modeling, to
previously used measures, which are more directly linked to empirical
observations of processing complexity. We compute surprisal by using a
trigram language model created with Colibri Core and trained on Wikipedia
texts, and then computing the perplexity of this model over a set of verb
clusters as a measure of surprisal.
Our results show that surprisal does not predict the word order choice by
itself, but is a significant predictor when used in a measure of uniform
information density (UID). This lends support to the view that human
language processing is facilitated not so much by predictable sequences of
words but more by sequences of words in which information is spread
Keywords: corpus linguistics verb clusters word order language
variation language model processing complexity uniform information

Parsing & Grammar Engineering

Parsing with Palm Trees

Gertjan Van Noord and Barbara Plank University of Groningen
In Choe and Charniak (2016), a parsing method is described in which an
LSTM is trained on a large set of parses - given as named bracketed
sentences. The resulting language model is then used to re-rank the 50 best
parses of the Charniak parser. Astonishingly, the resulting parser with
reranking obtains an F1 score on the WSJ of 92.6%. And with an ensemble
of eight LSTM-LMs a new state of the art of 93.8% is obtained.
In order to understand some of the reasons of this high accuracy, we train
LSTMs on a large set of parses provided by Alpino. Since the normal output
structures of Alpino are directed graphs, not trees, we decided to use the
derivation trees of the parser - rather than the dependency structures, as the
material to build a language model for.
We then apply a LSTM language model over Alpino derivation trees and
use it to re-rank the 50 best parses of Alpino. Initial results show that the
LSTM, trained on less than 3 million sentences, does not yet outperform the
best parse of Alpino. It does, however, perform much better than a trigram
language model trained on the same data.
In the presentation, we hope to provide further results of the re-ranking
experiments, as well as results on a set-up where the LSTM score of a
derivation tree is exploited as an additional feature for the Alpino
disambiguation component.
Keywords: parsing lstm language models Alpino palm trees

Minimalist Grammar Transition-Based
Milos Stanojevic University of Amsterdam
Current chart-based parsers of Minimalist Grammars exhibit prohibitively
high polynomial complexity that makes them unusable in practice. This
paper presents a transition-based parser for Minimalist Grammars that
approximately searches through the space of possible derivations by means
of beam search, and does so very efficiently: the worst case complexity of
building one derivation is O(n^2) and the best case complexity is O(n). This
approximated inference can be guided by a trained probabilistic model that
can condition on larger context than standard chart-based parsers. The
transitions of the parser are very similar to the transitions of bottom-up
shift-reduce parsers for Context-Free Grammars, with additional transitions
for online reordering of words during parsing in order to make non-
projective derivations projective.
Keywords: Minimalist Grammars Transition-Based Parsing Mildly
Context-Sensitive Grammars

Evaluating computational construction
Katrien Beuls and Paul Van Eecke Vrije Universiteit Brussel / Sony
Computer Science Laboratory
Computational construction grammar is a rapidly growing line of research
into language modelling that tightly integrates morpho-syntax with
semantics/pragmatics and combines constraints from different linguistic
perspectives, such as phrase structure, functional structure and information
structure. The computational formalisms that implement construction
grammar have now become mature enough to be used for the development
of large-scale bi-directional grammars, creating the need for appropriate and
objectively computable evaluation measures.
We propose a set of measures to serve this purpose. They can be applied to
single sentences and their meaning representations as well as to larger test
sets. The measures are explicitly designed to take into account the bi-
directional nature and processing efficiency of the grammars. Measures for
comprehension include the semantic coherence of the parsed meaning
representation, the distance between this meaning and a gold standard
(Smatch score) and the average branching factor of the constructional
search space. In production, we measure the probability of the formulated
sentence, the word level edit distance between this sentence and the gold
standard and the average branching factor. Moreover, we provide additional
measures that quantify a grammars bi-directional performance, for instance
if production of the parsed meaning leads to the original sentence or vice
versa. To report the overall coverage and accuracy of a grammar, aggregate
scores based on these measures can be computed. At CLIN, we will show
an operationalization of these measures implemented for Fluid Construction
Grammar and applied to a number of grammars for different languages.
Keywords: Grammar evaluation Computational Construction Grammar
Grammar engineering Fluid Construction Grammar

Constraints without Complaints: Pictograph-
to-Text Translation with DELPH-IN and
Adriaan Lemmens University of Leuven
In recent years, increasing attention has been paid to the potential of using
pictographs (aka pictograms, icons, or symbols) to open up the world of
online communication to users who, due to some intellectual disability, are
unable to interact with its default mode of information transfer: namely,
written natural language. To this end, tools are required that provide an
interface between natural language on the one hand and pictographic
symbols on the other.
There are considerable challenges in both directions. For a pictograph-based
writing tool (the focus of this talk), these include the properties that
pictographs, as a result of their simplicity, tend to be underspecified with
respect to number, may occur in any order, and are able to compact
semantically complex situations and entities into a single symbol. Further
methodological challenges arise from a sparsity of data relating to actual
pictographic language use, which poses problems for statistical methods,
and considerable variation among symbol sets.
This talk describes the development of the core translation architecture of a
hypothetical assistive writing tool that aims to enable users to compose
short written messages by selecting a sequence of pictographic images.
Unlike previously proposed systems, however, its approach to the task of
pictograph-to-text translation is unabashedly old-school, relying on 'deep'
parsing methods for the pictographic input and a traditional transfer-based
approach to translation and generation, all stages implemented within a
constraint-based framework (viz., HPSG). The result is a system that
produces semantically rich and consistently well-formed output. While
limited in coverage, it shows tremendous promise as a component in a
hybrid system.
Keywords: augmentative and alternative communication DELPH-IN
machine translation rule-based transfer-based HPSG LKB minimal
recursion semantics master's thesis

Entity & Term Extraction

Medical Entity Recognition in Dutch forum

data in the absence of labeled training
Nicole Walasek, Suzan Verberne and Maya Sappelli Radboud
University / TNO
The extraction of entities in medical text is an example instance of domain
specific entity recognition. Categories that can be of interest in this
particular domain are medicine, treatments, symptoms, food, diagnostic
procedures and diseases. The complexity of the task varies depending on the
data source (patient records and medical archives versus forum data).
An important aspect to consider is the availability of labeled training and
testing data. The acquisition of these labeled data is especially costly, the
more specific categories are desired. If labeled data for a particular named
entity recognition task exist, a vast majority of supervised machine learning
algorithms can be considered. The lack of these data however restricts the
possibilities and usually promotes the use of domain specific ontologies
such as the unified medical language system database (UMLS).
Furthermore, it is worth noting that most of the domain specific ontologies
are based on English data. Solving a domain specific entity recognition task
in another language therefore introduces additional complexity.
We aim to build a cancer-related Dutch ontology and therefore propose an
unsupervised entity extraction pipeline for Dutch medical forum data,
investigating whether it is possible to extract and classify medical entities in
the absence of labeled data.
The pipeline consists of the following steps: translation and database
lookup, spelling correction, lemma matching and KNN based on Word2vec.
We successfully extracted entities on the validation data with an average
precision of 0.81 and assigned correct class labels with an average precision
of 0.84 per category.
Keywords: Medical NLP Entity Recognition Ontology

Evaluation of named entity recognition in
user-submitted police reports
Marijn Schraagen and Floris Bex Utrecht University
The website of the Dutch Police facilitates submitting a crime report, partly
consisting of free text. To automate report processing, relation extraction
can be used, which in turn requires accurate named entity recognition
(NER). However, NER as offered by current Dutch parsers suffers from
limited accuracy. Issues with grammaticality and spelling of the crime
reports impair the NER even further. The current research aims to evaluate
NER results on the crime reports data set using large-scale human
judgment. The experiments are in progress, and the first results have been
collected. Aspects of this evaluation include assignment of named entity
types, recognition of multiword entities, mixed language issues and
theoretical considerations on the nature and use of named entities. The
evaluation is intended to provide pointers for increasing NER accuracy on
this type of data.
Keywords: named entity recognition evaluation spelling errors free
text entry crime reports

Multi-word units in lexical processing and
word learning: a computational
Robert Grimm, Giovanni Cassani, Steven Gillis and Walter Daelemans
University of Antwerp
Previous studies have suggested that children form cognitive representations
of multi-word units (MWUs), which may then affect the acquisition of
smaller Linguistic units contained within them. Building on these findings,
we propose that words which appear within a relatively large number of
MWUs are learned earlier by children and are recognized more quickly by
adults. To investigate this, we use a computational model to extract MWUs
from a corpus of transcribed speech among adults and a corpus of
transcribed child-directed speech. Controlling for word frequency, we
correlate the number of MWUs within which each word appears with (1)
age of first production and (2) adult reaction times on a word recognition
task. We find a facilitatory effect of MWUs on both response variables,
suggesting that MWUs benefit both word learning and word recognition.
Furthermore, the effect is strongest on age of first production, suggesting
that MWUs may be relatively more important for word learning. We discuss
possible causes and implications and formulate testable predictions.
Keywords: multi-word units age of first production reaction times
contextual diversity

Multiword Expression Extraction through
Independence and Coherence
Dirk De Hertog, Dirk Speelman and Piet Desmet University of
Automatic Term Extraction (ATE) deals with the automatic detection and
retrieval of domain specific terms from representative corpora. An
important step in the detection process is the identification of Unithood
(UH), which aims at establishing linguistic unity between multiple words.
We propose a lexico-constructional interpretation of UH as the basis of a
MultiWord Expression (MWE) detection algorithm. We argue that the two
distinctive characteristics to operationalize are independence and coherence.
Independence takes an external perspective on linguistic expressions and
looks at their independent productivity. Coherence governs allowed and
expected word combinations at the hand of grammatical regularities that
shape language use. These regularities serve as a common ground for users
to systematically interpret word combinations in a meaningful way.
The detection algorithm builds on the conceptual interpretation of
independence and coherence in the following manner. Independence
oberves the n-gram from a relative perspective and investigates how a
specific word combination relates to its immediate context. Coherence
focuses on how the expression's constituents belong together from a
linguistic point of view. It shifts the perspective to an internal one as it tries
to establish linguistic patterns and establish phrase structure.
The approach, in terms of ATE UH detection approaches, is a hybrid one, as
it exploits both statistical and structural linguistic information. It functions
in a largely unsupervised manner and successfully extracts expressions of
variable length.
Keywords: Multiword Terms Automatic Term Extraction Independence

Adaptation & Evaluation

How well can the crowd perceive the

multimodality and make relevance
judgments in case of video-to-video
Maria Eskevich, Martha Larson and Roeland Ordelman Radboud
University Nijmegen
Crowdsourcing is a powerful tool for relevance assessment on the large
scale. So far it has been shown to be useful in a number of speech and
language applications. However, the creative and complex process of
interacting with the video content requires in its turn a creative approach to
task definition and interaction with the crowd workers.
In our research we focus on the evaluation of the video-to-video retrieval
systems within the context of the Hyperlinking task at TRECVid evaluation
campaign. In 2016 we introduced a novel three stage framework that covers
multimodal anchors (queries) creation, and further retrieval results
We use different techniques to ensure that at each stage of crowdsourcing
we retain the focus on the multimodality. We combine manual assessment
and batch processing of submitted work, and run an extensive analysis of
the quality of the crowd workers judgments, and how it affects overall
systems performance assessment.
This analysis gives us insights into task overall complexity, diversity in the
collected ground truth, and expand our understanding of the aspects of
hyperlinking that are crucial for users satisfaction when they interact with
the video-to-video retrieval system.
Keywords: crowdsourcing multimedia hyperlinking evaluation

Syntactic Simplification for Improved Text-to-
Pictograph Translation
Leen Sevens, Vincent Vandeghinste, Ineke Schuurman and Frank Van
Eynde University of Leuven
In order to enable or facilitate online communication for people with
Intellectual Disabilities, the Text-to-Pictograph translation system
automatically translates Dutch written text into a series of Sclera or Beta
pictographs. The baseline system presents the reader with a more or less
verbatim pictograph-per-word translation, without changing the order of the
pictographs, and not removing any redundant information in the output
pictograph sequence. As a result, long and complex input sentences lead to
long and complex pictograph translations, which often leave the end users
confused and distracted.
We build an inventory of syntactic phenomena to be treated by the
simplification module and introduce deep linguistic analysis into the
translation process, using the Alpino parser for preprocessing. The Chinese
writing system, easy-to-read news messages for people with Intellectual
Disabilities, and the Klare Taal checklist for clear language serve as an
inspiration source. The simplification module splits long and complex
sentences into several shorter units, and deletes pictographs that do not
contribute to the essence of the message. This leads to shorter, clearer, and
more consistent pictograph conversions.
We perform automated evaluations of the simplification module using gold
standard simplifications. The system does not apply any unnecessary
simplification operations and high accuracy scores are obtained.
Applying syntactic simplification for Text-to-Pictograph translation is a
complex, yet necessary step toward making our system more user-friendly
and usable. Future evaluations will involve human judgments.
Keywords: Text-to-Pictograph Translation Syntactic Simplification
Augmented and Alternative Communication

Towards Domain Adaptation for Dutch Social
Media Text Trough Normalization
Rob van der Goot University of Groningen
When processing data from less canonical domains, current natural
language processing systems trained on canonical (news) data perform
poorly. One way of resolving this problem is to normalize the non-canonical
data to more canonical text before processing. Some examples at for this
approach include include Tjong Kim Sang (2016) and the CLIN27 shared
task. But instead of focusing on historical text, we will focus on the social
media domain. There are already numerous normalization systems for
English (eg. Baldwin et al., 2015).
For this purpose, we introduce a newly annotated corpus of 1,000
normalized Dutch Tweets. We hope that this resource can stimulate further
research in this direction. Additionally we will introduce the first results on
this corpus with our own normalization system for Dutch. The output of this
normalization system can be used in a pipeline for pos tagging, parsing and
other natural language processing tasks.
Our normalization model uses word embeddings trained on Twitter and the
spell checker Aspell to generate normalization candidates. Features from the
generation are then combined with features from 2 different n-gram
language models. One model learned from the source domain (social
media), and one from a more canonical domain(Google n-grams). We
combine these features in a binary random forest classifier. We can then use
the confidence score from the classifier as a score to rank possible
The performance of the normalization system compared to inter annotator
agreement will be discussed at the presentation.
Keywords: normalization Dutch domain adaptation twitter

The CLIN27 Shared Task: Translating
Historical Text
Erik Tjong Kim Sang Meertens Institute Amsterdam
Historical texts pose a challenge for automatic text processing tools that
have been developed for processing contemporary language. Languages
change over time so texts written centuries ago can differ from current texts
with respect to vocabulary, spelling and sentence structure. In order to
perform automatic linguistic analysis of such text, several language tools
need to be retrained and this process
will have to be repeated for different time points, which will require a lot of
work. An alternative method of automatic linguistic analysis of historical
text involves translating the texts to a modern language version and then
applying all available linguistic processing tools to the translation. In the
CLIN27 Shared Task, we explore this approach. Participants have used
Dutch bible versions from
different centuries to develop systems for automatically translating 17th
century Dutch texts to 21st century Dutch.
In this talk, we will provide an overview of the approaches taken by the
participants and present the results of the participating systems on
processing a blind test set.
Keywords: shared task historical text translation


A Universal Dependencies Treebank for

Gosse Bouma and Gertjan Van Noord University of Groningen
The Universal Dependencies initiative provides treebanks for various
languages annotated according to the same guidelines. We present the
construction of the Dutch Lassy Small UD treebank (included in release
1.3) and baseline parsing accuracy scores.
The Lassy Small UD treebank contains the Wikipedia section of the Lassy
Small corpus (7342 sentences). The UD annotation was produced
automatically by converting the existing Lassy Small dependency
annotation to the UD format. Whereas Lassy Small uses an annotation that
is a mix of phrase structure and dependency annotation, the UD format uses
dependency annotation between tokens only. Conversion of the Lassy POS-
tags to UD POS-tags was mostly straightforward. To convert the
dependency annotation, we need to provide for each token in the input its
dependency label and the head. This requires careful construction of
mapping rules especially for those cases where head-dependent relations get
The conversion script (which we will make available) can be used to
convert additional material conformant with Lassy annotation guidelines to
the UD format. This allows us to provide baseline accuracy scores for a
parser that produces UD annotation for Dutch. For this, we run the Alpino
parser on the data, and convert the output of the parser to the UD format.
This approach can serve as a baseline for parsers that are trained directly on
the UD annotation.
Keywords: Universal Dependencies POS-tagging Parsing

Learning Morpho-Syntactic Attributes
Representation for Cross-Lingual
Dependency Parsing
Mathieu Dehouck and Pascal Denis INRIA
Being able to transfer word level information from a resource rich source
language to a resource poor target language is crucial for learning a good
cross-lingual parser. Multi-lingual word embeddings have proven to be a
good solution to this problem as they represent words in a common space,
regardless of their respective language.
Several approaches have been proposed for learning multi-lingual word
embeddings, ranging from mono-lingual word embeddings alignment using
bilingual dictionaries, to multi-lingual embeddings induction via word
aligned corpus or via artificial code switching. But those approaches fail to
represent previously unseen words. They also tend to align words with their
semantic translations regardless of their respective part-of-speech and
syntactic use, and finally, they are not good at handling polysemy and
Instead of learning word representations, we propose to learn
representations that are based on morpho-syntactic attributes (thus
excluding word or lemma information). Morpho-syntactic attributes are
much more robust to language barriers, they do not require word alignment
to learn their representations and thus are not touched by the problem of
learning a single representation for different syntactic use. Furthermore,
previously unseen morpho-syntactic attributes can be assigned a
representation via composition. We propose to learn those representations
with graph-based and linear algebra techniques such as label propagation
and PCA. Experimental results on data from the Universal Dependency
Project show that such morpho-syntactic attributes representations are
beneficial for cross-lingual transfer between languages, even unrelated ones.
Keywords: Dependency Parsing Cross-Lingual Representation
Embeddings Learning Transfer

Annotating parses through subtree mappings
Tom Vanallemeersch University of Leuven
We propose a method which learns a conditional model for hierarchical,
annotated structures and allows for predicting information given
hierarchical structures with incomplete annotations. More specifically, the
method is geared towards the use of partial subtrees of parse trees, in which
nodes are decorated with attribute-value (AV) pairs, specifying, for instance,
the syntactic category of a node. Given a parse tree with known AV pairs,
prediction consists of determining target AV pairs, which involve a target
attribute. The method aims towards model simplicity and human readability,
by creating a single model with probabilistic mappings between partial
subtrees with known AV pairs and isomorphic partial subtrees with target
AV pairs. The method further aims towards minimal human intervention
and allows for scalability through parallellisation. We test the method on the
task of semantic role labeling, which uses the syntactic information in a
parse tree to annotate it with semantic predicates and roles, expressing
events and their participants.
Keywords: parsing machine learning semantic role labeling

Building a Parallel Meaning Bank for English,
Dutch, German and Italian
Abzianidze Lasha, Johannes Bjerva, Kilian Evang, Hessel Haagsma,
Rik van Noord, Pierre Ludmann, Duc-Duy Nguyen and Johan Bos
University of Groningen / ENS de Cachan / University of Trento
We present the latest developments in constructing the Parallel Meaning
Bank: a freely accessible parallel corpus annotated with shared meanings
for four different languages. The corpus contains a total of over 11 million
words (45% English, 34% German, 12% Italian, 9% Dutch), with bitexts
from corpora such as Europarl, RTE, Sherlock Holmes, Tatoeba and TED.
Our approach is based on annotation projection: semantic annotations for
English sentences are projected onto their word-aligned translations in
Dutch, German and Italian. The starting point for this is the existing
technology to automatically produce semantic analyses of reasonable
quality for English. We then correct these annotations, and assuming that
the translations are meaning-preserving, we project the annotations from the
English analyses to the other languages.
The automatic semantic annotation consists of five main steps: (i)
segmentation (using a statistical tokenizer); (ii) semantic tagging
(abstracting over POS-tags and named entities), (iii) symbolization
(combining lemmatization and normalization), (iv) syntactic parsing (with
Combinatory Categorial Grammar), and (v) formal semantic analysis (based
on Discourse Representation Theory).
These steps are mostly performed by tools using statistical models trained in
a supervised manner. The used annotation models are all language-neutral.
The semantic analyses are obtained compositionally from syntactic
derivations. Projecting the annotations from English to the other three
languages is challenging both from a theoretical and practical point of view.
We will show what these challenges are and what we have achieved so far.
Keywords: semantics corpora translation

Deep Learning

Computing abstract meaning representations

using deep learning techniques
Rik van Noord and Johan Bos University of Groningen
Recurrent neural network (RNN) solutions to machine translation have
become increasingly popular in the last years. Recently they have also been
applied to semantic parsing (mapping a natural language sentence to a
logical form). The popularity is mostly due to their domain-independence,
generalizing power, and remarkable effectiveness. Although the method has
been shown to work for relatively simple meaning representations, neural
semantic parsers are still outperformed by traditional semantic parsers based
on statistical models obtained from treebanks.
The aim of this research is to investigate in what way neural semantic
parsers can be improved. In this paper we report on producing Abstract
Meaning Representations (AMRs) for English sentences. AMRs are
semantic representations structured as single, rooted, directed graphs. We
train character-level recurrent sequence-to-sequence networks to translate
English sentences into AMRs.
The representations are created directly from text without any additional
syntactical analysis, essentially viewing this task as a machine translation
problem. First we reproduce the results of similar techniques in the
literature and then we investigate whether improvements can be achieved
using two approaches: (i) by modifying the structure of the input AMRs,
and (ii) by implementing different
versions of the neural models. Our models approach the results of the rule-
based semantic parser Boxer, but are not yet near the performance of more
traditional AMR parsers.
Keywords: neural semantic parsing abstract meaning representations
computational semantics recurrent neural networks sequence-to-
sequence models

Learning Representations of Phrases with
Siamese LSTM Networks
Carsten van Weelden, Beata Nyari and Mihai Rotaru Textkernel
Learning representations of textual data is a crucial component in
downstream NLP systems. An important application is linking entities
extracted from unstructured text to a knowledge base. In our use case, the
entities are job titles extracted from resumes or vacancies, and the
knowledge base is a hierarchical job title taxonomy. Successfully linking
job titles is particularly important in our application, as it directly influences
the performance of information retrieval- and data analytics solutions.
We apply a character-based deep siamese LSTM network to learn dense
embeddings of phrases. The embedding model is used with a KNN
classifier that maps the input to the taxonomy by similarity to taxonomy
entries. For training, two data sources are used: positive and negative
samples generated from the taxonomy, and additional positive data using
distant supervision.
The problem is particularly challenging because the extracted entities have
to be assigned to one of over 5000 classes. The model has to learn subtle
differences (e.g. chef de partie and chef de rang). Moreover, the model
has to be invariant to irrelevant extra words contained in extracted entities
(e.g. "class 1 driver using own vehicle, london").
Siamese networks rely heavily on negative samples. However, given the
large number of classes, the chance of sampling negative pairs with
significant similarity is very low. To offset this we propose sampling
techniques to restrict the negative sampling to pairs with many shared n-
grams. Additionally, we present experiments exploiting the hierarchical
nature of the taxonomy to improve classification.
Keywords: Siamese network LSTM Deep learning Phrase embedding
Representation learning Text similarity Entity linking

Generative Adversarial Networks for Dialogue
Elia Bruni and Raquel Fernndez University of Amsterdam
Despite the great success of artificial neural networks in modelling a variety
of language tasks, they are still very dependent on human supervision. The
consequence is that the learning has to be static and passive, where the kind
of training data is fixed once and for all. On the other hand, human
communication is a dynamic process which proceeds by an active and
incremental update of the speakers knowledge state. If we had to train an
artificial agent to successfully communicate by supervision, the machine
and the human would have to engage in an almost infinite loop in which, at
each significant learning progress of the agent, a new round of more
sophisticated human annotations should follow. Here we present a learning
framework that tries to address this limitation. The core of our proposal is to
let computational agents co-exist in the same environment and teach each
other language with minimal need for external supervision. In particular, we
adapt the recently introduced Generative Adversarial Networks (GAN)
framework to the case of dialogue. The idea behind GAN is to re-frame the
learning problem as a game played by two artificial agents locked in a
battle: a discriminator trying to distinguish real data from fake data and a
generator network trying to fool the discriminator by creating data that are
indistinguishable from real data. In our case the type of data are dialogue
passages, so that the generator has to fool the discriminator by producing
human plausible dialogue turns.
Keywords: Dialogue Generation GAN Deep Learning Reinforcement

Seq2Seq learning for Dialogue
Jordy Van Landeghem University of Leuven
Appropriate modelling of dialogue or conversation presents an imperative
task in order to achieve natural language understanding and real machine
intelligence. Whereas foregoing approaches focused on handcrafting rules
for principally specific domains (e.g. restaurant ordering), recent initiatives
have been proposing adaptations of deep learning architectures to build
unsupervised dialogue response generation systems. In this paper, we will
review these innovative contributions for the larger task of dialogue
generation & modelling and provide an overview of the state-of-the-art.
By means of an experiment with a new and original closed-domain noisy
dataset we will test the potential of the recently proposed sequence to
sequence framework for the task at hand. Hereby we hope to shed light on
what is currently possible with the newly devised models and identify
which directions are generally promising for the future.
The results demonstrate that the model and the ensuing generated responses
are determined by many factors, most importantly the nature, size and
preprocessing of the data on which it is trained and of which it seeks to
model the inherent conversational patterns. As expected, the currently
proposed sequence to sequence inspired models are not yet able to function
as stand-alone dialogue systems, but can find their usage in unsupervised
intent extraction or as a fallback basis for an existing virtual assistant.
Keywords: Generative dialogue modelling intent extraction seq2seq
learning unsupervised data-driven compositional word embeddings
high-dimensional clustering

Language Acquisition & Psycholinguistics

Introducing NT2Lex: A Machine-readable

CEFR-graded Lexical Resource for
Dutch as a Foreign Language
Anas Tack, Thomas Francois, Piet Desmet and Cedrick Fairon
Universit Catholique de Louvain / University of Leuven
The recent years have seen the emergence of a number of corpus-based
graded lexical resources, which include lexical entries graded along a
particular learning or difficulty scale. We argue that these graded lexicons
are a step towards rendering the inherent complexity of words more
apparent contrary to traditional (single-level) frequency-based lexicons
and could thus find their utility in the field of automatic text simplification,
to name but one. However, until now, this type of resource has only been
made available for a few languages, including French as a first and second
(L2) language (Lt et al., 2004; Franois et al., 2014) and Swedish L2
(Franois et al., 2016). The goal of our current work is therefore to expand
upon these previous developments, presenting a similar resource for Dutch
as a foreign language.
Our presentation will be twofold. On the one hand, we will present the
alpha version of the NT2Lex resource. We will describe the common
methodology used for collecting a corpus of readers and textbooks grader
per level of the Common European Framework of Reference (CEFR) and
for extracting and refining the per-level lexical frequencies from the
collected texts. On the other hand, we will present a concrete application of
the resource (and of a graded lexicon in general), which is linked to the task
of complex word identification and prediction. In particular, we will address
the concrete challenges were faced with when using a predictive model of
vocabulary knowledge at a given CEFR level.
Keywords: graded lexical resource Dutch as a foreign language
complex word identification

Useful contexts and easy words: effects of
distributional factors on lexical category
Giovanni Cassani, Robert Grimm, Steven Gillis and Walter Daelemans
University of Antwerp
Our research extends previous work on distributional bootstrapping, i.e. the
hypothesis that children use distributional co-occurrences to bootstrap the
acquisition of lexical categories. Our goal is to elucidate what distributional
properties of distributional contexts make them more useful to achieve a
reliable categorisation, and, in parallel, what distributional properties of
words make them easier to categorize.
Results of computational simulations on English child-directed speech show
that frequent and diverse contexts are more useful; on the contrary, contexts
that are more predictable given the words they occur with are less useful.
Finally, entropy has a positive effect: the higher the entropy, the more useful
the context. Interactions of these properties with higher exposure to the
input language are also investigated, showing that contexts occurring with
many different words become more useful over time, like contexts with
higher entropy. Turning to words, they appear to be easier to categorize
when they have low entropy, are on average difficult to predict given the
contexts they occur with, and tend to co-occur with few words. The effect of
these factors does not appear to change with time.
Based on these properties, it is easy to see that content words are easier to
categorize, consistently with developmental evidence; at the same time,
useful contexts tend to consist of function words. Relations between the
outcome of the statistical analysis on the computational simulation and the
developmental trajectory of children are critically evaluated in the light of
the relevant literature on language acquisition.
Keywords: Distributional bootstrapping Lexical category acquisition
Computational Psycholinguistics Statistical learning Computational
modeling Language acquisition

Modeling language interference effects in
bilingual word reading
Stphan Tulkens, Dominiek Sandra and Walter Daelemans University
of Antwerp
Bilinguals experience a variety of cross-lingual interference effects in word
reading, depending on the orthographical, phonological and semantic
overlap between their respective languages. Phonological and
orthographical overlap without overlap in meaning, for example, can cause
inhibition in reading. In the past, these effects have been successfully
explored using computational modelling, most prominently in the BIA+
model (Dijkstra and Van Heuven, 2005). In this talk, we argue that the
BIA+ model, being a symbolic, graph-based model, does not sufficiently
account for both word-level and sentence-level semantics, and is
overcomplete precisely because it is symbolic, and not data-driven.
Hence, In the spirit of the proposal by Jacquet and French (2002), we
propose to reimplement the BIA+ model in a connectionist fashion. We
posit that some of the phenomena for which BIA+ posits separate
components can be recovered by considering the bilingual lexicon as a
dynamical system, as modelled by a neural network.
We reimplement some of the lower-level components of the BIA+ model
using two parallel Topological Temporal Hebbian Self-Organizing Maps
(Ferro et al., 2011). We show that, by training the maps with orthographic
and phonological representations, language clusters emerge. These clusters
can explain some inhibition effects; words that phonologically occur in both
languages, for example, will be ambiguous between languages, which may
explain why they cause reading interference in bilinguals.
Keywords: Bilingualism Word reading Computational
psycholinguistics Connectionism Self-organizing maps

What do Neural Networks need in order to
Raquel G. Alhama and Willem Zuidema University of Amsterdam
In an influential paper, reporting on a combination of artificial language
learning experiments with babies, computational simulations and
philosophical arguments, Marcus et al. (1999) claimed that connectionist
models cannot account for human success at learning tasks that involved
generalization of abstract knowledge such as grammatical rules. This claim
triggered a heated debate, centered mostly around variants of the Simple
Recurrent Network model (Elman, 1990).
In our work, we revisit this unresolved debate and analyze the underlying
issues from a different perspective. We argue that, in order to simulate
human-like learning of grammatical rules, a neural network model should
not be used as a tabula rasa, but rather, the initial wiring of the neural
connections and the experience acquired prior to the actual task should be
incorporated into the model. We present two methods that aim to provide
such initial state: a manipulation of the initial connections of the network in
a cognitively plausible manner (concretely, by implementing a delay-line
memory), and a pre-training algorithm that incrementally challenges the
network with novel stimuli. We implement such techniques in an Echo State
Network (Jaeger, 2001), and we show that only when combining both
techniques the ESN is able to succeed at the grammar discrimination task
suggested by Marcus et al.
Keywords: artificial language learning rule learning neural-symbolic
computation neural networks

Distributional Semantics

Dutch Poetry Generation using Encoder-
Decoder Networks
Tim Van de Cruys IRIT / CNRS
In the last few years, neural networks techniques have become increasingly
popular in NLP applications. Recurrent neural networks, in particular, are
very good at capturing both syntactic and semantic properties of language
sequences. In this work, we use recurrent neural networks within an
encoder-decoder framework for poetry generation. By training the network
to predict the next sentence with the current sentence as input, the network
learns to generate plain text. The encoder first builds up a representation of
an entire sentence by sequentially incorporating the words of that sentence
into a state vector of fixed size. The final representation is then given to the
decoder, which outputs a sequence of words according to a probability
distribution derived from the hidden state. By transforming the probability
distribution yielded by the decoder in order to incorporate poetic
constraints, the network may be exploited for the generation of poetic verse.
The model described above will be applied to the generation of poems in
Keywords: neural networks language generation poetry

Improving Word2vec: Unshifting the shifted
PMI matrix
Minh Le Vrije Universiteit Amsterdam
Shortly after introduction, word2vec has become an important tool in
computational linguists' tool belt. Levy and Goldberg proved that the skip-
gram with negative sampling (SGNS) algorithm implemented in word2vec
works by implicitly factorizing the PMI matrix. Curiously, the objective
function of SGNS is shifted from PMI by a global constant log(k). We
hypothesize that this is a bug rather than feature and try to eliminate the
shift. The result is better performance on SimLex-999 dataset.
Keywords: distributional semantics distributed representation word

Unsupervised Word Sense Disambiguation
with Sense Extended Word Embeddings
Dieke Oele and Gertjan Van Noord University of Groningen
We propose a Knowledge-based Word Sense Disambiguation (WSD)
method using a combination of WordNet and sense extended word
embeddings. Our method is similar to the classic Lesk algorithm as it
exploits the idea that shared words between the context of a word and each
definition of its senses provides information on its meaning. However,
instead of counting the amount of words that overlap, we use sense-
extended word embeddings to compute the similarity between the gloss of a
sense and the context of the word. Similarity is computed by representing
both the gloss and the context as vectors and comparing them with sense
extended word embeddings. The strong point of our method is that it only
requires large unlabeled corpora and a sense inventory such as WordNet,
whereas other methods rely heavily on high-quality semantic relations or
annotated data, which are hard and expensive to acquire. Another advantage
is that it is readily applicable for other languages if a sense inventory is
available. Performance of our algorithm was tested on both Dutch and
English sentences in an all-words setup and compared to other Knowledge-
based methods. Evaluation of our method shows that it outperforms both a
random baseline and Lesk, while in some set-ups it also beats the most
frequent sense baseline. Additional experiments indicate that our method
works well in different domains and, for some domains, even outperforms
the most frequent baseline. A third experiment confirms the effect of the use
of gloss vectors.
Keywords: Word Sense Disambiguation Lesk algorithm Word

Similarity Learning for Coreference
Thibault Litard, Pascal Denis and Aurlien Bellet INRIA
Coreference resolution is the problem of partitioning a set of mentions
occurring in a text (typically, noun phrases) into clusters, in which each
cluster refers to the same real world entity. Most state-of-the-art models for
coreference resolution are pairwise models, where each pair of mentions is
represented as a vector of linguistic features. They typically learn a
classifier to predict whether two mentions are coreferent, and use these
predictions to partition the set of mentions into coherent coreference
clusters -- see for instance Soon et al. (2001).
In this work, we argue that a similarity learning criterion may be more
appropriate than a classification objective. The similarity function naturally
operates on pairs and different objectives can be considered. For instance,
we can learn the parameters of the similarity function such that the
similarity of a given mention to its closest antecedent coreferent mention is
larger than to any closer non-coreferential antecedent candidate. The
resulting similarity scores can then be naturally plugged into a greedy
clustering procedure, such as best-left-link (Ng & Cardie, 2002).
We empirically compare our similarity learning formulations to some
traditional classifier-based methods on the CoNLL-2012 datasets, based on
their CoNLL score and the error analysis tool developed by Kummerfeld &
Klein (2013). We also study the impact of incorporating external
information about the general context of occurrence of the mentions using
word embeddings, such as those produced by the method of Pennington et
al. (2014).
Keywords: coreference resolution similarity learning word embeddings


(To what Extent) can Neural Machine
Translation deal with Literary Texts?
Antonio Toral University of Groningen
Machine translation (MT) is widely used nowadays both in industry and
also by the public in a wide range of scenarios and for all kinds of purposes,
from gisting for assimilation to publishing for dissemination. Despite the
tremendous progress made by MT techniques in the last few decades and its
increasing ubiquitous role in our daily lives, there is one content type for
which, according to the perceived wisdom, the machine won't ever have a
chance: literary text.
Due to the emergence of a radically new paradigm to automated translation,
neural MT (NMT), and all the buzz around it, we deem it timely to revisit
the question of whether MT can be useful at all (to assist) with the
translation of literary text. This gives us the opportunity to test NMT on a
particularly challenging use case, which we hope should contribute to shed
further light on the real potential of this newly introduced approach to MT.
To this end we build NMT systems tailored to our domain of interest by
training them on original novels and their translations. We then evaluate and
analyse the quality of the translations produced by comparing them (i) to the
outputs obtained by a system based on the "classic" statistical phrase-based
approach and (ii) to the translations produced by professional literary
Keywords: machine translation neural machine translation literary text

Mapping the PERFECT via Translation Mining
Martijn van der Klis, Bert Le Bruyn and Henriette de Swart Utrecht
Recently, a trend in typology has been to create semantic maps (Haspelmath
1997), not from intuitions and examples, but directly from data extracted
from multilingual parallel corpora (Wlchli & Cysouw 2012). In our
research, we continue in the same vein, but focusing on the level of
grammar instead of the lexical domain. Specifically, we are interested in
mapping the PERFECT across five European languages (Dutch, English,
French, German, Spanish). We dub our method Translation Mining.
We first extracted present perfects from the EuroParl corpus (Tiedemann
2012) using a methodology that was presented at CLIN26 (van der Klis, Le
Bruyn & de Swart 2015). A human annotator (using a web application
designed for this purpose) then marked the corresponding verb phrases in
the aligned fragments. Tenses of these verb phrases were then automatically
or manually assigned, depending on the degree of detail of part-of-speech
tags per language.
This process yielded five-tuples of aligned tense attributions. We designed a
distance measure to be able to create a (dis)similarity matrix, and then
plotted this matrix using multidimensional scaling (MDS). On top of that,
we created an interactive visualization that allows researchers to manipulate
the dimensions of the MDS algorithm, as well as to inspect the individual
data points.
These interactive maps allowed us to reproduce earlier research (e.g.
Portner 2003), but also to draw new conclusions of the tense/aspect role of
the PERFECT across languages. We repeated the same method on the
OpenSubtitles2016 corpus (Lison & Tiedemann 2016) to check for genre
Keywords: semantic maps perfect tense-aspect multilingual parallel
corpora multidimensional scaling

Translationese and Posteditese: Can users
and computers detect the difference?
Joke Daems, Lieve Macken and Orphe De Clercq Ghent University
Translated texts are said to be different from original text, a phenomenon
called 'Translationese' (Gellerstam, 1986). While humans are not usually
capable of identifying the differences between both (Baroni & Bernardini,
2005), computers have successfully been trained to detect Translationese by
taking lexical and grammatical information into account ( Ilisei, Inkpen,
Corpas Pastor, & Mitkov, 2010, Koppel & Ordan, 2011; Volansky, Ordan &
Wintner, 2015). The more recent development of machine translation and its
subsequent post-editing can be expected to influence both textual
characteristics and reader perception. The final recipients of a translation,
for example, prefer human-translated texts over post-edited texts, even
when they do not know how a text was produced (Bowker and Buitrago
Ciro, 2015).
We therefore suggest expanding the work related to Translationese to
include Posteditese: the way in which a post-edited text differs from a
translation and/or an original text. In this talk, we wish to first establish
whether there is indeed such a thing as 'Posteditese, and if so, how it could
possibly be detected.
We use newspaper articles that have been translated and post-edited by
human translators. Analysis shows no significant difference in quality
between both translation methods (Daems, Vandepitte, Hartsuiker, &
Macken). We let readers select the post-edited text from two possible
translations, to determine whether humans indeed spot Posteditese', and
what they base their judgement on. We then look at various lexical and
syntactic differences between human-translated and post-edited texts, in
order to suggest potential parameters for automatic detection in future work.
Keywords: translation postediting Translationese Posteditese
translation method detection


Open source tools and recognition models

for using Kaldi on Dutch
Laurens van der Werff and Roeland Ordelman University of Twente
For the University of Twente NLSpraak project, financed in part by the
Dutch National Police, we developed tools and ready-to-use Dutch models
for the open source Kaldi speech recognition toolkit. Thanks to an
agreement with Taalunie, all our models and tools are available under an
Apache license, making them suitable for both research and commercial
applications. These tools and models are available now, and have been
tailored towards accessibility for non-technical users.
Performance so far has been encouraging with, for example, a WER of
~11% on the Dutch-BN part of the NBest-2008 benchmark at a speed of
roughly 0.1xRT on a quad core laptop. The use of our Kaldi models on the
Buchenwald interview collection, without any adaptation or optimization,
resulted in a WER reduction by around half compared to our previous
highly adapted GMM/HMM system.
We aim to update and expand our collection of models and tools in the
coming period and hope that others are willing to contribute their own
models so that the entire (Dutch) speech community may benefit.
Keywords: ASR Open Source models Kaldi-tools

Creating personalized text-to-speech voices
using deep learning
Esther Judd-Klabbers ReadSpeaker
Speak Technologies builds high-quality text-to-speech synthesis for many
types of applications. rSpeak Technologies is a software company initiated
by the owners of ReadSpeaker. The current rSpeak TTS system uses unit
selection synthesis, in which a large recording database from a single
professional speaker of the language is recorded and at synthesis time this
database is searched for the optimal sequence of acoustic units to synthesize
a sentence.
More recently, we have been investigating the implementation of Statistical
Parametric Speech Synthesis (SPSS) using deep learning technology to
develop new voices. This method requires a much smaller recording
database (<10% of a unit selection database) from a professional speaker to
train acoustic models for each sound of the language. The trained models
contain parameters that describe the spectrum and pitch of the speech that
are then synthesized with a vocoder.
The main advantages of using SPSS are twofold. Firstly, it is much easier to
develop several new voices because the recording and segmentation of the
recordings takes much less time. Secondly, the acoustic parameters such as
pitch, duration, and spectrum can be adapted to fit specific applications. The
presentation will describe preliminary results regarding the creation of
personalized voices with SPSS. As a side effect, we will make advances in
better part-of-speech tagging and prosody prediction using deep learning
Keywords: text-to-speech synthesis deep learning prosody modeling

Emergence of language structures from
exposure to visually grounded speech
Grzegorz Chrupaa, Afra Alishahi and Lieke Gelderloos Tilburg
A variety of computational models can learn meanings of words and
sentences from exposure to word sequences coupled with the perceptual
context in which they occur. More recently, neural network models have
been applied to more naturalistic and more challenging versions of this
problem: for example phoneme sequences, or raw speech audio signal
accompanied by correlated visual features. In this work we introduce a
multi-layer recurrent neural network model which is trained to project
spoken sentences and their corresponding visual scene features into a shared
semantic space. We then investigate to what extent representations of
linguistic structures such as discrete words emerge in this model, and where
within the network architecture they are localized. Our ultimate goal it to
trace how auditory signals are progressively refined into meaning
representations, and how this processes is learned from grounded speech
Keywords: speech language and vision cross-situational learning
grounding neural networks representation learning


A Standard for NLP Pipelines

Janneke Van Der Zwaan, Wouter Smink, Anneke Sools, Gerben
Westerhof, Bernard Veldkamp and Sytske Wiegersma Netherlands
eScience Center / University of Twente
One of the problems with existing NLP software is that to combine
functionality from different software packages, researchers have to write
custom scripts. Generally, these scripts duplicate at least some text
processing tasks (e.g., tokenization), and need to be adapted when used for
new datasets or in other software or hardware environments. This has a
negative impact on research reproducibility and reuse of existing software.
The use of workflow standards might help to reduce this problem. We
propose to use Common Workflow Language (CWL), a new specification
for describing data analysis workflows and tools, together with conventions
to standardize NLP workflows. The presentation will introduce NLP
Pipeline (nlppln), an open source software package that simplifies creating
and adapting NLP pipelines from existing NLP functionality using CWL.
Keywords: NLP Common Workflow Language Pipelines

PICCL: A Workflow for preparing corpora for
exploration and exploitation
Martin Reynaert, Ko van der Sloot, Maarten Van Gompel and Antal
van Den Bosch Tilburg University / Radboud University Nijmegen
In the framework of CLARIAH, work is underway on PICCL, the
`Philosophical Integrator of Computational and Corpus Libraries'. In part,
the project is spurred by its prime user, NWO `Groot' project Nederlab,
which aims to make available a broad range of digitally available diachronic
corpora of Dutch spanning its entire history. We will give an actual state-of-
affairs of its ongoing development.
PICCL is to be a complete workflow for building corpora and preparing
them for online exploration and exploitation. FoLiA XML is its pivot
format. Therefore it has a broad range of text convertors and optional OCR-
facilities for text images at its disposal.
Another optional PICCL subsystem, the Text-Induced Corpus Clean-up tool
TICCL, is also being extended with ambitious new functionalities such as n-
gram-based fully automatic post-correction of OCR and other lexical errors
for about 20 languages as well as modernization of historical Dutch.
Memory-based linguistic annotator Frog has been thoroughly tested on
Nederlab corpora great and small leaving it leaner in memory and meaner
on time after due optimization. Further, in seamless combination with its
tokenizer Ucto, it has been fully kitted with language recognition
capabilities, enabling full linguistic enrichment for the default language and
tokenization-only processing of foreign language text on the paragraph or
sentence levels.
After linguistic enrichment with lemmata, Part-of-Speech tags, and Named-
Entity labels PICCL further indexes the corpora towards corpus exploration
and exploitation with BlackLab, ready for its WhiteLab interface which e.g.
in OpenSoNaR+ makes both the written Dutch corpus SoNaR as well as the
spoken Dutch corpus CGN available to all.
Keywords: corpus-building workflow PICCL OCR post-correction
TICCL linguistic annotation Frog online corpus interface WhiteLab

Querying Large Treebanks: Benchmarking
GrETEL Indexing (GrInding)
Bram Vanroy, Vincent Vandeghinste and Liesbeth Augustinus
University of Leuven
Data that is available for research grows rapidly, yet technology to
efficiently interpret and excavate that data lags behind. For instance, when
using large treebanks in linguistic research, the speed of a query leaves
much to be desired. GrETEL Indexing, or GrInding, aims to tackle that
issue. The idea of GrInding comes down to making the search space as
small as possible before actually starting the search by pre-processing the
treebank at hand. We recursively divide it into smaller parts, called subtrees,
which are then converted into database files. All subtrees are organised
according to their linguistic dependency pattern, and labelled as such.
Additionally, general patterns are linked to more specific ones. By doing so,
we create millions of databases, and given a linguistic structure we know in
which databases that structure can occur, leading up to a significant
efficiency boost.
GrInding as a theoretical implementation of a pre-processing method has
already been discussed before (Vandeghinste & Augustinus 2014). In this
talk we present the results of using the procedure on the SoNaR-500
treebank. We will touch on the significance of the speed gain we logged
during our benchmarks, and the advantages and disadvantages of GrInding
linguistic data.
Vincent Vandeghinste & Liesbeth Augustinus (2014). Making a Large
Treebanks Searchable Online. The SONAR case. In: Marc Kupietz, Hanno
Biber, Harald Lngen, Piotr Baski, Evelyn Breiteneder, Karlheinz Mrth,
Andreas Witt & Jani Takhsha (eds), Proceedings of the LREC2014 2nd
workshop on Challenges in the management of large corpora (CMLC-2).
Reykjavik, Iceland. pp. 15-20. ISBN 978-2-9517408-8-4.
Keywords: big data linguistics computational linguistics treebank
querying corpus linguistics

Authorship & Topic Modelling

Efficiency of Arabic Grammatical

Representation in Authorship
Mahmoud Shokrollahi-Far Tilburg University
In authorship verification (AV) literature the accuracy of text classification
(TC) is usually reported to increase if grammatical features, such as POS
tags, are used to supplement words as features for representing texts to
machine learning systems. If grammatical features are employed alone in
text representation, either classification accuracy would decrease or the
number of features employed would increase tremendously which actually
means the decrease in representation efficiency. Both cases are reported in
the literature for AV in English and Dutch texts, two analytic languages.
However, the case seems to be different for morphologically rich languages
such as Arabic. The present research has implemented some experimental
TCs for AV of Islamic corpora. The corpora is morpho-syntactically
analysed and tagged employing MOBIN knowledge-based system. To
represent the corpora to Support Vector Machines reliably, i.e. effectively
and efficiently, different features are tested, varying from words to features
constructed on morpho-syntactic tags. Although the text segments are
considerably as short as 300 to 100 words, all features have resulted in AV
F-score higher than 0.9. Nevertheless, employing training data developed on
grammatical features has resulted not only in the highest effectiveness, i.e.
0.999, but also in the highest efficiency in terms of the lowest number, i.e.
130, of features employed in the representation of the corpora to machine
learning system. Even though these results would mean the high reliability
of Arabic grammatical features in AV classifications, the representation
methodology is of considerable application to similar researches in other
morphologically rich languages.
Keywords: Grammatical Features Efficient representation Authorship
Verification Arabic MOBIN system

Profiling Dutch authors on Twitter:
discovering political preference and
income level
Reinard van Dalen and Lon Melein University of Groningen
Research into profiling authors on Twitter has so far mostly focused on
English-speaking users and attributes like age and occupation. Few studies
consider other languages like Dutch or different attributes like political
preference or income. In our presentation, we will present our bachelor
theses on profiling Dutch Twitter users with their political preference and
income level. Both theses apply a roughly similar approach to their
respective hypotheses, but differ in the features they employ to predict their
respective attribute.
As a dataset of Dutch Twitter users labelled with either their political
affiliation or their income level is not readily available, they had to be
created. Users were extracted from the University of Groningen Twitter
corpus. Using distant supervision, these users were labelled with their
political preference and income level based on information from their
profiles. The tweets of these users were collected. This resulted in two
separate corpora, one for each attribute, and a classifier was built for each
In this talk we will present the results of predicting political preference and
income level of Dutch Twitter users. We evaluate the classifiers built for
each attribute against baselines and discuss what features are predictive for
income and political preference, respectively.
Keywords: authorship profiling social media twitter political
preference income level

Bayesian Nonparametric Topic Models for
Short Text Data
Renzo Poddighe and Gerasimos Spanakis Maastricht University
Topic modeling is a suite of algorithms, which aims to discover the hidden
structures in large digital archives. Topic modeling algorithms like Latent
Dirichlet Allocation perform unsupersvised learning, thus they do not
require any prior annotations or labeling of the documents; the topics
emerge from the analysis of the original texts.
Traditional topic models have several shortcomings when applied to
archives consisting of short and noisy documents, such as Twitter. More
specifically, it is assumed that each document is derived from a specific
topic distribution but short text sparsity raises an issue on whether topic
models can be applicable.
For this purpose, a wide range of models, belonging to the classes of finite
Bayesian mixture models and nonparametric Bayesian models, are
compared in terms of statistical likelihood and topic coherence. In addition,
an extension to the existing state of the art is proposed, called the Biterm
Pitman-Yor process, and compared to the other models.
Experimental analysis is performed on three datasets having documents of
different length in order to demonstrate the short text issue: The Reuters-
21578 dataset (normal length documents), the Tweets2011 (short text) and a
third one in dutch language (extremely short text). Results indicate that
newly proposed methods for overcoming data sparsity improve traditional
models on both the statistical and semantic level. The newly proposed
biterm Pitman-Yor process shows comparable performance to state-of-the-
art, while increasing the flexibility of the modeling process, making the
result more malleable to the users expectations.
Keywords: topic models short text bayesian non-parametric methods

Poster Presentations

Natural proof system for natural language
Lasha Abzianidze University of Groningen
Recognizing semantic relations (e.g., entailment and contradiction) between
sentences is a crucial task for natural language understanding. To account
for this task, we combine simplicity of a natural logic and flexibility of a
semantic tableau.
A logic is called natural if its terms resemble linguistic expressions. In our
natural logic, "Everybody cycles in Groningen" is expressed by a term:
(every person)((in Groningen) cycle). The terms of the natural logic are
called Lambda Logical Forms (LLFs) and they are built-up from lexical
constants, variables and lambda abstraction. LLFs contain no logical
quantifiers or connectives.
A semantic tableau is a proof calculus which proves propositions via
refutation. For example, A entails B, if it is impossible to find a
counterexample where A is true and B is false. We designed a version of the
well-known tableau calculus for our natural logic, called Natural Tableau.
Natural Tableau comprises ca. 80 inference rules that decompose an
extensive list of syntax-semantic constructions.
Based on Natural Tableau, we implemented a theorem prover. The prover
obtains LLFs from syntactic derivations of Combinatory Categorial
Grammar parsers.
It has several virtues:
(a) LLFs are cheap to obtain from syntactic derivations.
(b) Employed logic is higher-order and enough expressive for linguistic
(c) Poofs are human readable and explanatory.
(d) The prover can reason over several premises, unlike many NLP
reasoning systems.
(e) It demonstrates almost perfect precision.
The prover achieves high results on certain standard textual entailment
datasets. The demo of the prover is available at:
Keywords: Natural logic Semantic tableau Textual entailment

Treebank querying with GrETEL 3: bigger,
faster, stronger
Liesbeth Augustinus, Bram Vanroy and Vincent Vandeghinste
University of Leuven
We describe the new version of GrETEL
(, an online tool which allows users to
query treebanks by means of a natural language example (example-based
search) or via a formal query (XPath search).
The new release comprises an update to the interface and considerable
improvements in the back-end search mechanism.
The update of the front-end is based on user suggestions. In addition to an
overall design update, major changes include a more intuitive query builder
in the example-based search mode and a visualizer for syntax trees that is
compatible with all modern browsers. Moreover, the results are presented to
the user as soon as they are found, so users can browse the matching
sentences before the treebank search is completed. We will demonstrate that
those changes considerably improve the query procedure.
The update of the back-end mainly includes optimizing the search algorithm
for querying the (very) large SoNaR treebank. Querying this 500-million
word treebank was already made possible in the previous version of
GrETEL, but due to the complex search mechanism this often resulted in
long query times or even a timeout before the search completed. The
improved version of the search algorithm results in faster query times and
more accurate search results, which greatly enhances the usability of the
SoNaR treebank for linguistic research.
Keywords: treebank querying corpus linguistics search tool

Domain Adaptation for LSTM Language
Wim Boes, Robbe Van Rompaey, Joris Pelemans, Lyan Verwimp and
Patrick Wambacq University of Leuven
Statistical language models predict the next word in a sentence, a task that
has many applications such as machine translation, automatic speech
recognition, auto-correction, ... The current state of the art in language
modeling is based on artificial neural networks with memory that enables
them to capture language phenomena across sentence boundaries. More
abstract, long-distance phenomena such as the current topic of a
conversation or domain of a text however still seem to present too large a
In this work we present the progress of our master thesis in which we
attempt to find solutions for this challenge. We make a thorough analysis of
the parameters involved in LSTM language models and propose domain-
adaptive neural network architectures.
Keywords: language modeling LSTM domain adaptation

Towards more efficiently generated topic
models. Preventing the extraction of
generic clusters through non-
discriminative term removal
Mathias Coeckelbergs Universit Libre de Bruxelles / University of
In several NLP tasks, the practice of removing stopwords is common
practice as a preprocessing step. Stopwords are then defined as words which
do not contribute to the task at hand, such as for example conjunctions and
prepositions. For example for topic modeling, these semantically vacuous
words are very likely not to make a useful contribution to the classification
of keywords. However, when we evaluate results after modeling, we also
discover topics recurring in all documents, which hence are also not
contributing semantically. These generic clusters are constituted by words
we will call non-discriminative terms. In this talk we will discuss our
experiments on testing the usefulness of automatically deleting these words
from consideration. All our investigations are performed on an economic
corpus of the European Commission, which contains several such non-
discriminative terms such as decree, investment, decision, which return as
keywords in all documents. We first explain why the most intuitive
approach, namely to simply delete these topics after running the topic
modeling algorithm is inadequate, due to skewness in the final modeling
result. Next, we will focus on three methodologies, investigating the worth .
Firstly, we consider tf-idf methods, in order to indicate the limiting results
we can achieve by only considering frequencies. Secondly, we will look at
the Eurovoc controlled vocabulary in order to estimate its contribution to
the recognition of non-discriminative terms. Thirdly, we will consider
named entity recognition and investigate to what extent we can achieve
similar results as with Eurovoc.
Keywords: Topic Modeling Automatic Stopword Removal Named
Entity Recognition Controlled Vocabulary

Universal Reordering via Linguistic Typology
Joachim Daiber, Milos Stanojevi and Khalil Simaan University of
In this paper we explore the novel idea of building a single universal
reordering model from English to a large number of target languages. To
build this model we exploit typological features of word order for a large
number of target languages together with source (English) syntactic features
and we train this model on a single combined parallel corpus representing
all (22) involved language pairs. We contribute experimental evidence for
the usefulness of linguistically defined typological features for building
such a model. When the universal reordering model is used for preordering
followed by monotone translation (no reordering inside the decoder), our
experiments show that this pipeline gives comparable or improved
translation performance with a phrase-based baseline for a large number of
language pairs (12 out of 22) from diverse language families.
Keywords: machine translation linguistic typology reordering neural

Court verdict: unreadable
Orphe De Clercq and Lenie Van Hecke Ghent University
Legal texts are often very difficult to read, they suffer from jargon and
syntactic complexity. As a result many guidelines (Kimble 2006) and even
laws (Plain Writing Act) have been set up to make these more readable.
When automatically assessing the readability of legal texts, much attention
is still assigned to superficial text characteristics (van Boom 2014). In the
past, De Clercq and Hoste (2016) have developed a state-of-art readability
prediction system able to assess generic Dutch and English text. This
system calculates both shallow (e.g. word length, type token ratio) and deep
(e.g. parse tree depth, coreference) linguistic characteristics. The system has
been trained on a corpus comprising a variety of text genres and will require
retraining when applied solely to legal texts. In this presentation we
describe an experiment consisting of two user groups: 25 Master students of
Law and 25 of Linguistics. Both groups were asked to assess the readability
of ten legal texts independently of each other. Based on these assessments
all texts were assigned a rank and processed with our generic readability
system. The students could also comment on their assessments. As
expected, prior knowledge is of crucial importance when training a
readability prediction system and the characteristics on which such a system
relies should be adapted accordingly, i.e. by assigning weights to specific
linguistic characteristics.
De Clercq, Orphe, & Hoste, V. (2016). All mixed up? Finding the optimal
feature set for general readability prediction and its application to English
and Dutch. COMPUTATIONAL LINGUISTICS, 42(3), 457490.
Kimble, J.: 2006, Lifting the Fog of Legalese. Carolina Academia Press.
van Boom, W.: 2014, Begrijpelijke hypotheekvoorwaarden en
consumentengedrag, in T. B. en A.A. van Velten (Eds.), Perspectieven voor
vastgoednanciering (Congresbundel Stichting Fundatie Bachiene),
Amsterdam: Stichting Fundatie Bachiene 2014, 45-80.
Keywords: readability prediction prior knowledge legal texts

Generalisation in Automatic Metaphor
Marco Del Tredici and Raquel Fernandez University of Amsterdam
Metaphorical meaning arises in context: pretty much any word in the
lexicon can occur in one or more contexts that trigger its metaphorical
interpretation. Since listing all these contexts is unfeasible, NLP systems
must perform metaphor detection in an on-line fashion. We argue that, in
order to successfully perform this task, metaphor detection systems not only
need to learn which is the contextual information that is relevant to identify
metaphorical use of a specific set of words, but should also be able to
generalize such knowledge to unseen words.
No current computational work on metaphor detection has explicitly
addressed this generalisation problem. We investigate it in a study where we
analyse the performance of two supervised models, a Support Vector
Machine and a Long Short Term Memory Neural Network, on data from the
Amsterdam Metaphor Corpus. The models are trained on sentences that
include target words labelled as literal or metaphoric and, in contrast to
previous work, are tested on sentences with target words that have not been
seen during training.
We perform a deep analysis of the performance of the two models, which
allows us to address the following questions: (i) which are the most relevant
linguistic features the two models need in order to generalize information
about metaphorical use? (ii) are there significant differences in the way the
two models generalise this information?
Keywords: metaphor long short term memory generalisation

Testing language technology tools with
persons with an intellectual disability: a
collaboration between researchers,
developers and target users
Annelies De Vliegher, Jo Daems and Nele Bosch Thomas More
Thomas More, as a partner of the Able To Include project has been involved
in two years of pilot studies with persons with an intellectual disability
(PID). During this project, PID have tried out and given feedback on new
technical tools such as text-to-speech, text-to-picto and picto-to-text
integrated in the apps AbleChat and AbleSocial. AbleChat enables persons
with an ID with poor literacy skills to send and receive text messages in
pictograms. It incorporates picto-to-text and text-to-picto. AbleSocial hooks
into social media apps such as Facebook and Messenger and integrates all
three summed up language technology tools.
During the pilot studies, Thomas More researchers have worked closely
together with the technical partners and have learned many things from this
collaboration and would like to share these.
Apart from the usefulness of the feedback from the users for the technical
partners to adapt the tools, researchers also found that it is important to run
pilot studies in a very specific and realistic situation. Abstract thinking is
often hard for PID, which is also why it is important to not test several
things simultaneously. Although from a developers point of view, testing
with users seems like a golden opportunity to gather a great amount of
information during a short time period, experience has shown that this is an
intensive and timely process. An active collaboration between the PID, the
technical partners and researchers familiar with the target users is very
much recommended to make the development process as efficient as
Keywords: language technology pilot persons with an intellectual
disability collaboration with PID and developers experiences

Twitter usage as a channel for scientific
knowledge sharing
Maria Eskevich Radboud University Nijmegen
Twitter can be regarded as a network to share the scientific progress with
either a narrow group of colleagues in the field or with the wider
communities of potential stakeholders .
As a test case scenario, we collect twitter activity information of the
recently funded FP7 and H2020 projects projects, i.e. their twitter IDs,
hashtags of the project, of the conference/events advertised via those
We analyse the life cycle of those social networks in regard to the project
life cycle.
Keywords: social network analysis twitter text and data mining

Spelling correction for clinical text with word
and character n-gram embeddings
Pieter Fivez, Simon uster and Walter Daelemans University of
Clinical texts, such as patient records and nursing notes, represent a
challenge for NLP techniques due to text characterics such as frequent
spelling and typing errors. We propose an unsupervised spelling correction
system that requires no human-annotated data for training. Our methods
relies on an exhaustive list of canonical lexical items and word embeddings
enriched and constructed with subword character n-gram embeddings. We
generate a restricted set of replacements for detected misspellings, and rank
these solely on the basis of the context in which the misspellings appear. We
investigate this system as a first step towards normalization software for
clinical text that is relatively transferrable between subdomains, sparse in
resources and flexible in architecture.
Keywords: spelling correction clinical text unsupervised word
embeddings character n-gram embeddings

Wikification for Implicit MT evaluation, First
results for Dutch
Iris Hendrickx and Antal van Den Bosch Radboud University
Machine translation evaluation requires a human in the loop: either for
providing gold-standard written translations or for human judgements. As
such human effort is expensive and time-consuming, we aim to look at
methods to evaluate translation quality without human supervision. We
focus on topics and entities in the text and we make smart use of the inter-
lingual connections in Wikipedia to determine whether the MT system
managed to translate the entity correctly. We use a Wikifier (Ratinov et al,
2011) to automatically find entities with Wikipedia links in the English text
and compare them against the linked entities in the target language's
equivalent Wikipedia page. We present the first outcomes of this method on
Dutch MOOC data and discuss the limits of this approach such as
Wikipedia coverage and Wikifier errors. These experiments are part the
TraMOOC project ( that aims to automatically translate
online course material of MOOCs to 11 different languages.
Ratinov, L. and D. Roth and D. Downey and M. Anderson, Local and
Global Algorithms for Disambiguation to Wikipedia. ACL (2011)
Keywords: Machine translation evaluation Wikipedia wikfication
Entity linking

Bilingual Lexicon Induction by Learning to
Combine Word-Level and Character-
Level Representations
Geert Heyman, Ivan Vuli and Marie-Francine Moens University of
Leuven / University of Cambridge
We study the problem of bilingual lexicon induction (BLI) in a setting
where some translation resources are available, but unknown translations
are sought for certain, possibly domain-specific terminology. We frame BLI
as a classification problem for which we design a neural network based
classification architecture composed of recurrent long short-term memory
and deep feed forward networks. The results show that word- and character-
level representations each improve state-of-the-art results for BLI, and the
best results are obtained by exploiting the synergy between these word- and
character-level representations in the classification model.
Keywords: bilingual lexicon induction machine translation
representation learning character-level modelling

Classifier optimization for cyberbully
detection: finding the needle in the
Gilles Jacobs Ghent University
We approach automatic detection of cyberbullying in Dutch and English
social media as a text classification task. Because positive instances of
cyberbullying are far and few between this is an imbalanced learning
problem where the positive class is problematically smaller than the
negative class. Classification is further hampered by high dimensionality in
the vectorized dataset. We investigate several combinations of feature
selection (both filter and wrapper approaches), resampling, and classifier
optimizations to treat imbalance and high dimensionality issues in text
Keywords: cyberbullying detection text classification optimization
feature selection resampling

Structured Learning for Temporal Relation
Extraction from Clinical Records
Artuur Leeuwenberg and Marie-Francine Moens University of
We propose a scalable structured learning model that jointly predicts
temporal relations between events and temporal expressions (TLINKS), and
the relation between these events and the document creation time (DCTR)
in clinical texts. We employ a structured perceptron, together with integer
linear programming constraints for document-level inference during
training and prediction to exploit relational properties of temporality,
together with global learning of the relations at the document level.
Moreover, this study gives insights in the results of integrating constraints
for temporal relation extraction when using structured learning and
prediction. Our best system outperforms the current state-of-the art on both
the CONTAINS TLINK task, and the DCTR task.
Keywords: structured learning temporal information extraction relation
extraction clinical natural language processing

Simulating language change in an artificial-
intelligence computational model
Sander Lestrade Radboud University Nijmegen
This talk introduces a recently developed artificial-intelligence
computational model of language evolution. It is assumed that language
starts out as a vocabulary with words for actions and objects only. Agents
talk about events in their immediate surroundings. Depending on the
number of distractor events that are simultaneously ongoing, they have to be
more or less specific when selecting referential terms. And depending on the
degree to which the hearer agent's world knowledge is estimated to predict
the event to be described, additional role marking may be used to make sure
the role distribution of the event participants can be understood properly. In
addition to such communicative considerations, frequency and recency of
usage play a role in word selection, for both referential items and role
As an important goal of the model is to show that much of language
structure emerges spontaneously, only very general cognitive principles are
implemented, such as a desire for communicative success, shared attention,
recognition of communicative intention, and desire for economic
expressions. In the absence of a grammar, agents use "proto-principles" to
communicate about their world, e.g. assuming that things that stand together
in form belong together in meaning.
Initially word selection involves semantically motivated and fully specified
lexical items only. Over time, however, words can desemanticize and erode
as a result of frequent usage. Thus, grammatical structure can be shown to
emerge, both for reference (pronominal systems) and role marking (case
Keywords: language evolution grammar computational modeling

Improving interpolation factors in a Bayesian
skipgram language model
Louis Onrust, Hugo Van Hamme and Antal van den Bosch Radboud
University Nijmegen / University of Leuven
In this work we present the results of a study into interpolation factors of a
Bayesian skipgram language model. We use the traditional notion of
skipgrams, which generalise the n-gram by allowing gaps in the pattern by
means of skipping over words.
With skipgrams we can have multiple patterns with the same number of
words, and we want their probabilities to weigh according to their
importance. We use three measures of importance, one based on the
maximum likelihood of the pattern, one based on the entropy of its context,
and an uninformed uniform prior.
Additionally, we investigate the effect of the depth of the backoff procedure.
In other literature the backoff procedure is a full recursive procedure.
However, this ignores the fact that if we encountered an example in the
training data, this gives a more informed probability after which we can
We show that we can easily improve over n-gram results by using
skipgrams, and that a limited backoff procedure outperforms a full recursive
Keywords: Language models Interpolation factors Skipgrams Backoff

Annotation of Content Types in Historical
Texts and Contemporary News
Rachele Sprugnoli, Sara Tonelli and Tommaso Caselli Fondzione
Bruno Kessler / University of Trento / Vrije Universiteit Amsterdam
This abstract reports on the definition of new annotation guidelines for
labeling content types. Content types are pieces of texts having specific
semantic and functional characteristics. We applied our annotation
guidelines on two English corpora belonging to two different genres and
periods of publication. The annotated dataset constitutes the first step
towards the development of a system for the automatic identification of
content types.
Keywords: annotation corpus creation content types

Clinical Case Reports dataset for machine
Simon uster and Walter Daelemans University of Antwerp /
University Hospital of Antwerp
One of the most pressing issues in clinical natural language processing is
access to clinical texts. In this work, we describe the creation of a dataset of
clinical case reports for English. The case reports are detailed descriptions
of clinical cases of individual patients which include their medical history,
diagnoses, treatments and outcomes. They are thematically close to patient
records, but do not suffer from privacy issues, and are published as online
medical journal articles. Their editing and easy accessibility make them
especially suited for natural language processing. We exploit their fine
internal structure to semi-automatically build a cloze-style reading
comprehension dataset. We will use the dataset for training and evaluating
models of machine reading and understanding.
Keywords: clinical NLP machine reading machine learning

Estimating Post-editing Time Using Machine
Translation Errors
Arda Tezcan, Vronique Hoste, Lieve Macken Ghent University
With the improved quality of Machine Translation (MT) systems in the last
decades, post-editing (the correction of MT errors) has gained importance in
Computer-Assisted Translation (CAT) workflows. Depending on the
number and the severity of the errors in the MT output, the effort required to
post-edit varies from sentence to sentence. Existing Quality Estimation
(QE) systems provide quality scores that reflect the quality of an MT output
at segment level. However, they fail to explain the relationship between
different types of MT errors and the required post-editing effort to correct
them. We suggest a more informative approach to QE in which different
types of MT errors are detected in the first step, which are then used to
estimate post-editing effort in a second step. In this paper we define the
upper boundary of such a system. We use different machine learning
methods to estimate Post-Editing Time (PET) by using a gold-standard set
of MT errors as features. We show that post-editing time can be estimated
successfully when all the translation errors in the MT output are known.
Furthermore, we apply feature selection methods and investigate the
predictive power of different MT error types on PET. Our results show that
PET can be estimated with high performance by only using a small subset
of MT error types.
Keywords: machine translation post-editing machine learning feature
selection quality estimation

Corpus Upload and Metadata Analysis
Extensions for GrETEL
Martijn van der Klis and Jan Odijk Utrecht University
Linguistic tools are primarily used only within the linguistic community,
likely because they normally do not allow to import data, as well as lack
options to deal with metadata. As part of CLARIAH WP3 we extended the
treebank search engine GrETEL (Augustinus et al. 2012) with a possibility
to upload corpora, as well as functionality to analyze and filter metadata.
The corpus upload functionality allows users to upload an archived
collection of plain-text files. The software will tokenize and parse these files
using the Alpino dependency parser (Bouma et al. 2001), and import them
into the XML database BaseX (Grn 2010) for querying with GrETEL.
Users can specify their corpus as private (only searchable for them) or
publicly available. We are currently working on providing a wider range of
input formats (e.g. CHAT, FoLiA, TEI).
For adding metadata to corpora, we use a format defined during
development of PaQu, which allows users to add inline metadata (see for details). The
software reads in the metadata and will create faceted search in GrETEL to
allow users to both analyze and filter their search results. Users can change
the facets to their liking, e.g. to use a range filter instead of checkboxes for
numeric metadata. Results can be exported including their accompanying
The extension to GrETEL is created using CodeIgniter (a PHP web
framework) and will be available as open-source software (MIT license). A
demo is available at
Keywords: treebank search engine corpus upload metadata faceted

A data-to-text system for soccer reports
Chris van der Lee, Emiel Krahmer and Sander Wubben Tilburg
For the past few years, news organizations worldwide have begun to show
interest in automating various types of news reports. One of the domains
that is especially viable for automation is the domain of sports, since the
outcomes of most sports matches can be extracted from the data.
Additionally, sports statistics (who played, who scored, etc.) are stored for
many games that are not visited by sport-reporters (Gatt, & Krahmer, in
press). Automated text generation systems can generate reports for these
While there are already some companies that use an automated data-to-text
system for soccer reports (e.g. Yahoo, Narrative Science, Norwegian News
Agency), there are no such systems yet for the Dutch language that are used
by news organizations. The goal of this project is to develop such a data-to-
text system for Dutch news networks. To achieve this, an existing data-to-
text system: the GoalGetter system (Theune, Klabbers, De Pijper, Krahmer,
& Odijk, 2001) is replicated and expanded upon. One such expansion is that
the current system produces multiple match reports, biased towards one
team or the other. This makes it possible to tailor match reports based on the
preferences of the reader, thus possibly making the match reports more
enjoyable to read.
Keywords: Natural Language Generation Data-to-text Tailoring

Monday mornings are my fave :) #not
Exploring the Automatic Recognition of
Irony in English Tweets
Cynthia Van Hee, Els Lefever and Vronique Hoste Ghent University
Recognising and understanding irony is crucial for the improvement natural
language processing tasks including sentiment analysis. In this study, we
describe the construction of an English Twitter corpus and its annotation for
irony based on a newly developed fine-grained annotation scheme. We also
explore the feasibility of automatic irony recognition by exploiting a varied
set of features including lexical, syntactic, sentiment and semantic
(Word2Vec) information. Experiments on a held-out test set show that our
irony classifier benefits from this combined information, yielding an F1-
score of 67.66%. When explicit hashtag information like #irony is included
in the data, the system even obtains an F1-score of 92.77%. A qualitative
analysis of the out- put reveals that recognising irony that results from a
polarity clash appears to be (much) more feasible than recognising other
forms of ironic utterances (e.g., descriptions of situational irony).
Keywords: machine learning irony detection social media

Extracting Contrastive Linguistic Information
from Statistical Machine Translation
Eva Vanmassenhove, Jinhua Du and Andy Way Dublin City
The fields of Contrastive Linguistics (CL) and Machine Translation (MT)
are closely-related but rarely brought together in practice (Culo, 2016).
However, just as MT can benefit from contrastive translation studies, so too
can MT offer insights into empirical linguistic investigations. In particular,
Statistical Machine Translation (Koehn, 2003) phrase-tables can be
exploited to derive generalized linguistic information.
In this study, we examine the interaction between lexical aspect and
grammatical aspect when translating English simple past verbs into French
past tenses (pass compos and imparfait). We compiled a list of 206
English verbs classified into three aspectual classes (Wilmet, 1998). We
trained two SMT systems with the Moses toolkit (Koehn et al, 2007): (1)
trained on 1 million parallel sentences of the Europarl corpus (Koehn et al,
2005), (2) trained on the News Commentary corpus. The translation
probabilities of the simple past verbs were extracted from the phrase-tables
and added together, obtaining for every verb its probability to be translated
into pass compos or imparfait. Many verbs have a strong preference for
one tense or another in both corpora. However, the use of the pass
compos is more frequent in the Europarl Corpus which causes some (atelic
activity and stative) verbs to shift preference from imparfait to pass e
While the aspectual class of the verb does not seem to have a big influence
on the use of the pass compos, a preference for imparfait seemed to be
exclusively for verbs belonging to the stative class.

Keywords: Machine Translation Contrastive Linguistics Lexical Aspect

Identifying Mood of Songs using Musical and
Linguistic Information
Menno van Zaanen and Bram Willemsen Tilburg University
People nowadays have access to large digital music collections. Because of
the sheer size of these collections, people create playlists which group
together similar songs, for example, based on their mood. Unfortunately,
creating playlists is time-consuming and requires knowledge of the songs in
the collection. In this research, we present work that automatically assigns
mood information to songs. Most existing work in this area focuses on the
analysis of musical aspects of the songs, but here we also use information
extracted from the lyrics. In contrast to previous work, which used word-
based features, such as tf*idf, we present a topic modeling approach. We
used songs with their musical properties from the Million Song Dataset,
collected their lyrics, as bags of words, from the musiXmatch dataset, and
linked the songs to mood categories from Gracenote. We trained an LDA
model on a subset of English lyrics. This model was then used to compute,
for each song in our dataset, a probability distribution over topics. This
information indicates how well a song fits different topical classes. We then
compared different systems that either used musical features, LDA features,
or a combination of the two types of features to assign mood tags to the
songs. We find that information from the lyrics has a significant, positive
effect on the classification accuracy. This is in contrast to previous work
that deals with lyrics, which often requires extensive tuning in order to get
significant improvements.
Keywords: lyrics mood LDA

Project STON: speeding up Dutch subtitling
with speech and language technology
Lyan Verwimp, Joris Pelemans and Patrick Wambacq University of
We present an overview of our work for the project STON Spraak- en
Taaltechnologisch Ondertitelen in het Nederlands (subtitling in Dutch with
the help of speech and language technology), in collaboration with the
University of Ghent, the Flemish national broadcaster VRT and three
companies (Devoteam, Limecraft and PerVoice). The purpose of the project
was to speed up the work of the subtitlers of VRT. A modular and user-
friendly platform was built that integrates audio segmentation (speech/non-
speech segmentation, speaker diarisation and language identification) and
speech recognition. Our technology generates a set of proposed subtitles,
which ideally only need minor manual post-processing of the subtitler.
We focus on our own contribution to the project, namely improving the
language model and lexicon of the speech recognizer. Our work involved
text normalization, several strategies for cleaning up the lexicon and
experiments with language model adaptation. Furthermore, we make
available language models trained on 46M words of normalized subtitles
(they can be freely downloaded from Since large corpora of
transcripts of spoken language are rare, these models are a valuable resource
of spoken Dutch.
Keywords: speech recognition language modeling subtitling

Visual Analytics for Parameter Tuning of
Semantic Similarity Models
Thomas Wielfaert, Kris Heylen, Dirk Speelman and Dirk Geeraerts
University of Leuven
In the last 2 decades, a wide range of statistical techniques have been
developed for the Corpus-based modelling of the semantic similarity
between words and word uses (see Turney and Pantel, 2010 and Baroni et
al., 2014 for an overview and comparison of different models). At the
SemEval competitions, different models have proven to be state-of-the-art
for different tasks, such as synonymy extraction, lexical substitution, word
sense disambiguation and word sense induction. All of these semantic
similarity models heavily rely on tuning different parameters and
experimenting with different types of models. Comparing differently
parameterized models is commonly done using task-specific F-scores on a
gold standard. However such an evaluation is not well suited to analyse the
effect of parameter settings on specific items (error analysis), nor to explore
the task-independent properties of models. As a complementary tool, we
therefore propose a visual analytics approach that visualizes semantic
similarity matrices directly, independent of a specific task, and that allows
to explore and compare interactively how differently parametrized models
affect the semantic similarity between specific items or groups of items with
specific properties. The tool consists of three levels in which alternative
models can be selected (1), compared (2) and inspected (3) through
interactive scatter plots and scatter plot matrices. We illustrate our visual
analytic approach in a setting where it is used to fine-tune Word Sense
Induction models. In our a case study, the items are individual word uses
(tokens) from the 2010 SemEval WSI task, and the models are differently
parameterized frequency-based distributional semantic models.
Keywords: semantic similarity visualization parameter tuning

Comparative study of DNN-HMM and end-to-
end DNN architectures for ASR.
Moritz Wolter, Vincent Renkens, Patrick Wambacq and Hugo Van
Hamme University of Leuven
Using filter bank spectra as inputs, the traditional DNN-HMM approach
uses a deep neural network to model emission densities in HMMs. We
present some improvements of this traditional approach through novel
techniques such as batch normalization. Additionally, if the HMM is
replaced with trainable recurrence functions, the error can be back-
propagated trough the entire system. According to recent publications such
end-to-end systems can be implemented using Connectionist Temporal
Classification (CTC) as well as Listen Attend and Spell (LAS). CTC is a
method for training recurrent neural networks using unsegmented label
sequences. In the LAS architecture, the listener provides high level features
to an attention network which implements temporal focus to the spelling
network, i.e. a network modeling the probability distribution of output
characters conditioned on the history. During decoding a transcription is
found based on these distributions and any additional constraints. We are
investigating and comparing the performance of these methods on the
publicly available data sets TIMIT and AURORA4. Code is made available
through github.
Keywords: Acoustic model Machine Learning Tensorflow