You are on page 1of 1

How to Add a New Language on the NLP Map:

Building Resources and Tools for Languages with Scarce Resources

Rada Mihalcea Vivi Nastase


University of North Texas EML Research gGmbH
rada@cs.unt.edu Vivi.Nastase@eml-r.villa-bosch.de

Abstract Biographies
Those of us whose mother tongue is not English or Rada Mihalcea is an Assistant Professor of Com-
are curious about applications involving other lan- puter Science at the University of North Texas. Her
guages, often find ourselves in the situation where research interests are in lexical semantics, multi-
the tools we require are not available. According lingual natural language processing, minimally su-
to recent studies there are about 7200 different lan- pervised natural language learning, and graph-based
guages spoken worldwide – without including vari- algorithms for natural language processing. She
ations or dialects – out of which very few have auto- serves on the editorial board of the Journal of Com-
matic language processing tools and machine read- putational Linguistics, the Journal of Language Re-
able resources. sources and Evaluations, the Journal of Natural Lan-
In this tutorial we will show how we can take guage Engineering, the Journal of Research in Lan-
advantage of lessons learned from frequently stud- guage in Computation, and the recently established
ied and used languages in NLP, and of the wealth Journal of Interesting Negative Results in Natural
of information and collaborative efforts mediated by Language Processing and Machine Learning.
the World Wide Web. We structure the presentation
around two major themes: mono-lingual and cross- Vivi Nastase is a post-doctoral fellow at EML Re-
lingual approaches. Within the mono-lingual area, search gGmbH, Heidelberg, Germany. Her research
we show how to quickly assemble a corpus for sta- interests are in lexical semantics, semantic relations,
tistical processing, how to obtain a semantic network knowledge extraction, multi-document summariza-
using on-line resources – in particular Wikipedia – tion, graph-based algorithms for natural language
and how to obtain automatically annotated corpora processing, multilingual natural language process-
for a variety of applications. The cross-lingual half ing. She is a co-founder of the Journal of Interest-
of the tutorial shows how to build upon NLP meth- ing Negative Results in Natural Language Process-
ods and resources for other languages, and adapt ing and Machine Learning.
them for a new language. We will review automatic
construction of parallel corpora, projecting annota-
tions from one side of the parallel corpus to the
other, building language models, and finally we will
look at how all these can come together in higher-
end applications such as machine translation and
cross-language information retrieval.

938

You might also like