You are on page 1of 28

NLP Tasks and Applications

600.465 - Intro to NLP - J. Eisner 1


p(class | token in context)
(WSD)

Build a special classifier just for tokens of “plant”

slide courtesy of D. Yarowsky


Part of Speech Tagging
 We could treat tagging as a token classification problem
 Tag each word independently given features of context
 And features of the word’s spelling (suffixes, capitalization)

600.465 - Intro to NLP - J. Eisner 3


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

NNP

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

VBD

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

DT

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

NN

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

CC

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

VBD

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

TO

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

VB

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

PRP

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

IN

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

DT

Slide from Ray Mooney


Sequence Labeling as
Classification
• Classify each token independently but use
as input features, information about the
surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

NN

https://medium.freecodecamp.org/an-introduction-to-part-of-
speech-tagging-and-the-hidden-markov-model-
953d45338f24
Slide from Ray Mooney
Named Entity Recognition
CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday
it has increased fares by $6 per round trip on flights to some cities
also served by lower-cost carriers. American Airlines, a unit AMR,
immediately matched the move, spokesman Tim Wagner said.
United, a unit of UAL, said the increase took effect Thursday night
and applies to most routes where it competes against discount
carriers, such as Chicago to Dallas and Atlanta and Denver to San
Francisco, Los Angeles and New York.

2/15/2020 Slide from Jim Martin 16


NE Types

Slide from Jim Martin 17


Information Extraction
As a task: Filling slots in a database from sub-segments of text.

October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO Bill


Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.

Today, Microsoft claims to "love" the open-


NAME TITLE ORGANIZATION
source concept, by which software code is
IE Bill Gates CEO Microsoft
made public to encourage improvement and
Bill Veghte VP Microsoft
development by outside programmers. Gates
Richard Stallman founder Free Soft..
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.

"We can be open source. We love the concept


of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“

Richard Stallman, founder of the Free


Software Foundation, countered saying…
Slide from Chris Brew, adapted from slide by William Cohen
Positive or negative movie
review?

• unbelievably disappointing
• Full of zany characters and richly applied
satire, and some great plot twists
• this is the greatest screwball comedy
ever filmed
• It was pathetic. The worst part about it
was the boxing scenes.

19
Google Product Search

• a

20
Bing Shopping

• a

21
Target Sentiment on
Twitter

• Twitter
Sentiment App
• Alec Go, Richa Bhayani, Lei Huang.
2009. Twitter Sentiment Classification
using Distant Supervision

22
Sentiment analysis has many
other names
• Opinion extraction
• Opinion mining
• Sentiment mining
• Subjectivity analysis

23
Why sentiment analysis?

• Movie: is this review positive or negative?


• Products: what do people think about the new iPhone?
• Public sentiment: how is consumer confidence? Is
despair increasing?
• Politics: what do people think about this candidate or
issue?
• Prediction: predict election outcomes or market trends
from sentiment

24
Extracting Features for
Sentiment Classification

• How to handle negation


 I didn’t like this movie
vs
 I really like this movie
• Which words to use?
 Only adjectives
 All words
 All words turns out to work better, at least on this
data
25
Detection of Friendliness
Ranganath, Jurafsky, McFarland

• Friendly speakers use collaborative


conversational style
 Laughter
 Less use of negative emotional words
 More sympathy
 That’s too bad I’m sorry to hear
that
 More agreement
 I think so too
 Less hedges
26
• https://text-processing.com/demo/tag/
term frequency-inverse document
frequency
• http://www.tfidf.com/
• https://en.wikipedia.org/wiki/Tf%E2%80
%93idf example is interesting
• https://gist.github.com/himzzz/4105717
Program on tfidf

You might also like