Natural Language Processing (NLP) Research at Boston University

Natural Language
Processing (NLP) Research

at Boston University
Derry Wijaya
Boston University
About Me
• BSc and MSc in Computer Science from
National University of Singapore
• PhD in Computer Science, Language

Technologies Institute, from
Carnegie Mellon University
• Postdoc in Computer Science at University of Pennsylvania
• Asst. Professor in Computer Science at Boston University
• Developed interest in the field of Information Extraction as an

undergraduate doing Undergraduate Research Opportunity
Project
Research Focus
• Natural Language Processing (NLP) and Machine
Learning
• Media Framing: how different media ”frame”

news stories differently
• Low Resource Machine Translation: automatic

translation for languages that have few translation
data for training
Introduction to NLP
• Natural Language Processing
• techniques to give computers the ability to

process, learn, and understand human
languages
• other names: speech and language processing,

human language technologies, computational
linguistics, speech recognition and synthesis, …
Goal of NLP
get computers to perform tasks
involving human language
• conversational agent
• machine translation (MT)
• question answering (QA)

nell
• information extraction (IE)
• …
NLP for Interdisciplinary
Research
• Psychology: measure stress, anxiety, depression based on
patients’ tweets or social media posts
• Medicine: extract information from doctors’ or physicians’ notes
• Public Health: measure epidemiology like diabetes, obesity
risk factors from users’ tweets and Instagram
• Economy: predict stock performance of a company based on
news articles about the company
• Accounting: automatically create balance sheets from financial
reports
• Education: automated tutoring, grading
• Communication: detect media frames: angles of the story
Goal of NLP
Computers
solving task
involving human
language
Knowledge of Algorithm
task
Language solve ambiguity
captured by
Model
Image source: medium.com

Models
• NLP systems rely on models to capture knowledge
of language e.g., formal rule systems to capture
morphology, and syntax/grammar
“Hamid Ansari was nominated

for Vice President”
“Vice President was nominated

context free for Hamid Ansari”
grammar “Hamid Ansari was nominated”
“Vice President was nominated”
a set of production rules that

rules can be applied
describe all possible strings
regardless of context
Models
• e.g., vector space models to capture word
meanings “you shall know a word by the company it
keeps (Firth, J. R. 1957:11)”
pasta, lamb,
cheese, mushroom
citrus, apple,
orange, lime
aromatic, nose,
scent, perfume
http://methodmatters.blogspot.com/2017/11/using-word2vec-to-analyze-word.html
Algorithms for Solving
Task
• Given the models, search through a space of
hypotheses about an input
• e.g., a classifier can be trained to compute the

sentiment polarity of a word i.e., whether it conveys a
positive/negative sentiment based on the word vector
• e.g., a machine translation algorithm searches

through a space of translation hypotheses for the
correct translation of a sentence into another
language
NLP and the Measure of
Intelligence
using language as humans do
== truly intelligent machines?
Turing Test
by responding as a
person to the examiner’s
questions, the machine
wins if it can convince the
examiner into believing
that it is a person
https://xkcd.com/329/
Intelligence
ELIZA program (Weizenbaum, 1966)
NLP system that imitates a psychotherapist
ELIZA uses pattern matching, knows nothing of the world,

but many people thought that it really understood them and their problems!
Intelligence
ELIZA program (Weizenbaum, 1966)
NLP system that imitates a psychotherapist
says more about the people than

about the machine
ELIZA uses pattern matching, knows nothing of the world,

but many people thought that it really understood them and their problems!
Intelligence
regardless, people talk about computers and

interact with them as social entities;
expecting computers to understand their needs and
be able to interact naturally (Reeves and Nass 1996)
Intelligence
regardless, people talk about computers and

interact with them as social entities;
expecting computers to understand their needs and
be able to interact naturally (Reeves and Nass 1996)
The importance of NLP!

An Exciting Time for
NLP!
• Increase in computing resources
• Increase in the amount of data and information

available in digital form
• Development of highly successful machine

learning methods and competitive evaluations
(SemEval, NIST, CoNLL shared tasks, Kaggle)
• Richer understanding of the structure of human

language and its deployment in social contexts
State of the Art
• Simple methods often work very well when trained

on large quantities of data
• e.g., many text and sentiment classifiers still rely

on different sets of words (“bag of words”) without
regard to sentence and discourse structure or
meaning
State of the Art
• However, most NLP resources and systems are
available only for high resource languages
• Many low resource languages are spoken by

millions of people e.g., Bengali, Javanese,
Swahili, …
• The challenge is how to develop resources and

tools for thousands of languages, not just a few
The State and Fate of Linguistic
Diversity in the NLP World
• However, most NLP resources and systems are
available only for high resource languages
the winners
the hopefuls the underdogs
the rising stars

the left-behinds
the scraping-bys
https://microsoft.github.io/linguisticdiversity/
• Bahasa Indonesia is one of the rising star!
Class 5 Example L anguages #L angs #Speaker s % of Total L angs

0 Dahalo, Warlpiri, Popoloca, Wallisian, Bora 2191 1.2B 88.38%
1 Cherokee, Fijian, Greenlandic, Bhojpuri, Navajo 222 30M 5.49%
2 Zulu, Konkani, Lao, Maltese, Irish 19 5.7M 0.36%
3 Indonesian, Ukranian, Cebuano, Afrikaans, Hebrew 28 1.8B 4.42%
4 Russian, Hungarian, Vietnamese, Dutch, Korean 18 2.2B 1.07%
5 English, Spanish, German, Japanese, French 7 2.5B 0.28%
Table 1: Number of languages, number of speakers, and percentage of total languages for each language class.
• But, Javanese, Sundanese, Minangkabau are
scraping-bys
• And… Bugis, Ternate, and Manadonese are left-
behinds!
Class 5 Example L anguages #L angs #Speaker s % of Total L angs

0 Dahalo, Warlpiri, Popoloca,
Bugis, Ternate, Manadonese, …
Wallisian, Bora 2191 1.2B 88.38%
1 Cherokee, Fijian, Greenlandic,
Javanese, Sundanese,Bhojpuri, Navajo
Minangkabau, … 222 30M 5.49%
2 Zulu, Konkani, Lao, Maltese, Irish 19 5.7M 0.36%
3 Indonesian, Ukranian, Cebuano, Afrikaans, Hebrew 28 1.8B 4.42%
4 Russian, Hungarian, Vietnamese, Dutch, Korean 18 2.2B 1.07%
5 English, Spanish, German, Japanese, French 7 2.5B 0.28%
Table 1: Number of languages, number of speakers, and percentage of total languages for each language class.
• We need to collect more data on Indonesian
languages!
the winners
the hopefuls the underdogs
the left-behinds the rising stars

Bugis, Ternate,
Manadonese, … Javanese, Sundanese, Minangkabau, …
the scraping-bys
https://microsoft.github.io/linguisticdiversity/
Machine Translation
• Challenging because correct translations require:
• ability to analyze and generate sentences in

human languages
• understanding of world knowledge and context to

resolve the ambiguities of languages
• e.g., the French word “bordel” straightforward

translation is “brothel” but what if someone
says My room is un bordel?
Machine Translation
• Started with hand-built grammar based systems
(limited success) Direct Translation
Figure from "A history of machine translation from the Cold War to deep learning" by Ilya Pestov
Machine Translation
Transfer-based Machine Translation
Machine Translation
Interlingual MT
Machine Translation
Machine Translation
• Transformed with the availability of parallel
sentences to collect statistics of word translations
and word sequences (circa 2006)
• small word groups often have distinctive

translations — phrase based MT, which formed
the basis of Google translate
Machine Translation
• Transformed with the availability of parallel
sentences to collect statistics of word translations
and word sequences
Machine Translation
• Another transformation using deep learning-based
sequence models (circa 2014)
Neural Machine
Machine Translation Translation
EMNLP 2014
Kyunghyn Cho et al
The source text is encoded by one neural network,

and then another neural network decodes it back to
the text, but, in another language. The decoder only
knows its language. Both have no idea about the each
other, and each of them knows only its own
language. Interlingua is back.
Machine Translation
Neural Machine Translation (NMT)
The idea is close to style transfer, the language is the style,

the essence of the text is the same. Interlingua is BACK!
Machine Translation
Let Deep NN figure out the

specific features (i.e., the
interlingua)
Machine Translation
• train a deep neural network model with several

representational levels to optimize translation
quality
• the model learns intermediate representations

that are useful for translation
Machine Translation
• Long Short Term Memory Network
• maintain contextual information from early until

late in a sentence
Single system to translate between multiple languages
Figure from "https://ai.googleblog.com/2016/11/zero-shot-translation-with-googles.html"

“Interlingua” – 3-d representation of Multilingual Google NMT
internal network data
Figure from "https://ai.googleblog.com/2016/11/zero-shot-translation-with-googles.html"

BU NLP Research
Overview

data for training

Low Resource Machine
Translation
• Can we learn to translate without parallel data?
Few/no parallel data
Plentiful monolingual
EMNLP 2017, ACL 2018 data
EMNLP 2017, ACL 2018
Translation
• Word vector representations in different languages
might have similar geometric arrangements
(Mikolov, T., Le, Q.V. and Sutskever, I., 2013)
english spanish
Translation
• Starting with some ”anchors”, we can learn a
mapping (W) that aligns words in different language
spaces
Translation
• Then, we can use the learned bilingual spaces to
find translations
Translation
• Then, we can use the learned bilingual spaces to
find translations
BPR_W BPR_W+C BPR_LINEAR BPR_NN BPR_WE
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
so uz vi ne gu ta te az bn lv hi cy hu bs sq uk sk id sv tr nl bg it fr sr ro es
Translation
• We can also use images to find translations in
different languages
• Since images of a word (e.g., “horse”) is the same

no matter the language
(turkish)
a Chr is Callison-Bur ch
sity of Pennsylvania
ormation Science Department
i z, der r y, ccb} @seas. upenn. edu
Translation
kucing
y Identify translations
s. cat via images
- associated with
s animal words in different
d languages that
have a high
n persian
degree of visual
- similarity
n pet
d
r Figure 1: Our dataset and approach allow translations to be
y discovered by comparing the images associated with foreign
and English words. Shown here are five images for the Indone-
s.
Translation
NN
78 .219 konsep
25 .084
64 .350
96 .368 oriented
80 .321
56 .257
39 .350
department
07 .449
39 .201
04 .045 gifted
18 .244
10 .392
top-level
sh dictionary
ds were used
The SM A L L Figure 2: Shown here are five images for the abstract Indone-
results from sian word konsep, along with its top 4 ranked translations
Translation
NN
78 .219 konsep
25 .084
64 .350
96 .368 oriented
80 .321
56 .257
39 .350
concreteness of the word
department
07 .449 matters
39 .201
04 .045 gifted
18 .244
10 .392
top-level
sh dictionary
ds were used
The SM A L L Figure 2: Shown here are five images for the abstract Indone-
results from sian word konsep, along with its top 4 ranked translations
Massively Multilingual Image Dataset
(MMID)
100 10,000 250,000

Languages Words per language English word translations
100 35,000,000
Images per word Total images
20TB
of data
Hosted by Amazon Public Datasets multilingual-images.org
http://multilingual-images.org/
Translation
Summary
• We can translate even when we don’t have parallel

training data
• But we need other data, like monolingual data:

news articles, web pages, books, images, captions,
…
• Some parallel data can further help

BU NLP Research
Overview

data for training

Media Framing
• To Frame
To select some aspects of a perceived

reality and make them more salient
CoNLL 2019, ACL 2020

CoNLL 2019, ACL 2020
Media Framing
To select some aspects of a perceived
reality and make them more salient
Frame: Race/Ethnicity
Frame: Mental Health

Media Framing
Political climate in the U.S. is
increasingly polarized
• Main reason: news media of varied political
orientations have been depicting two distinct
versions of social reality (Mitchell et al., 2014; Stroud, 2011)
Need to assess the ways in which news

reporters frame important public affairs
Media Framing
Media Framing
Gun violence is one of the most polarized
issues in the country (Pew Research Center, 2018b)
Why? One potential explanation: Framing!

Media Framing
2,990 news headlines
1300 annotated as relevant

and with up to 2 frames
319 have 2 frames

21 media outlets
Frame A: Public Opinion

Frame B: 2nd Amendment
Gun
Violence
Codebook and dataset are publicly available: Frame
https://derrywijaya.github.io/GVFC.html Corpus
Media Framing
BERT Gun Violence

(Devlin et al., 2018)
Frame Corpus
Media Framing
Some peaks represent the deadlines mass

shootings in the U.S. since 1949
Media Framing
Left Neutral/Main Stream Right
16% Society/Culture 27% Mental Health 22% Mental Health

8% Mental Health 9% Society/Culture 5% Society/Culture
Media Framing
ACL 2020 Submission ***. Confidential
(a) U.S. Frame Network (b) German Frame Network
Figure 2: Comparison of frame association networks in

Media Framing
Summary
• Frame analysis can be used to gain a deeper

understanding of various issues of public affairs
that may ultimately determine public perception of
the issue
What’s Next?
• More collaborations needed
• To bring visibility to languages in Indonesia
• Javanese, spoken by ~70 millions people, has 57
thousand Wikipedia pages
• Swedish, spoken by ~10 millions people, has 3.7
What’s Next?
millions Wikipedia pages
What’s Next?
millions Wikipedia pages
• We need ~2 to 10 millions monolingual sentences to
obtain a reasonable unsupervised MT system
• Collaborate to collect data
What’s Next?
• Research collaborations
• Co-advising (Garuda Ilmu Komputer)
• More programs like visiting research at UI
• Consulting and training
• Programming (python, pytorch), data science,
machine learning, deep learning
(https://www.ilmudata.com/)
Thank You!
• wijaya@bu.edu

Natural Language Processing (NLP) Research at Boston University

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Natural Language Processing (NLP) Research at Boston University

Uploaded by

Copyright:

Available Formats

Natural Language

Processing (NLP) Research

• PhD in Computer Science, Language

• Postdoc in Computer Science at University of Pennsylvania

• Asst. Professor in Computer Science at Boston University

• Developed interest in the field of Information Extraction as an

• Media Framing: how different media ”frame”

• Low Resource Machine Translation: automatic

• techniques to give computers the ability to

• other names: speech and language processing,

• machine translation (MT)

• question answering (QA)

Image source: medium.com

“Hamid Ansari was nominated

“Vice President was nominated

grammar “Hamid Ansari was nominated”

“Vice President was nominated”

a set of production rules that

• e.g., a classifier can be trained to compute the

• e.g., a machine translation algorithm searches

ELIZA uses pattern matching, knows nothing of the world,

says more about the people than

ELIZA uses pattern matching, knows nothing of the world,

regardless, people talk about computers and

regardless, people talk about computers and

The importance of NLP!

• Increase in the amount of data and information

• Development of highly successful machine

• Richer understanding of the structure of human

• Simple methods often work very well when trained

• e.g., many text and sentiment classifiers still rely

• Many low resource languages are spoken by

• The challenge is how to develop resources and

the hopefuls the underdogs

the rising stars

Class 5 Example L anguages #L angs #Speaker s % of Total L angs

Class 5 Example L anguages #L angs #Speaker s % of Total L angs

the hopefuls the underdogs

the left-behinds the rising stars

• ability to analyze and generate sentences in

• understanding of world knowledge and context to

• e.g., the French word “bordel” straightforward

• small word groups often have distinctive

The source text is encoded by one neural network,

The idea is close to style transfer, the language is the style,

Let Deep NN figure out the

• train a deep neural network model with several

• the model learns intermediate representations

• maintain contextual information from early until

Figure from "https://ai.googleblog.com/2016/11/zero-shot-translation-with-googles.html"

Figure from "https://ai.googleblog.com/2016/11/zero-shot-translation-with-googles.html"

• Low Resource Machine Translation: automatic

• Media Framing: how different media ”frame”

Few/no parallel data

• Since images of a word (e.g., “horse”) is the same

100 10,000 250,000

• We can translate even when we don’t have parallel

• But we need other data, like monolingual data:

• Some parallel data can further help

• Low Resource Machine Translation: automatic

• Media Framing: how different media ”frame”

To select some aspects of a perceived

CoNLL 2019, ACL 2020