You are on page 1of 14

Lecture 24:

NER & Entity Linking

Kai-Wei Chang
CS @ University of Virginia
kw@kwchang.net

Couse webpage: http://kwchang.net/teaching/NLP16

CS6501-NLP 1
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Organizing knowledge
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.

Slides are adapted from Dan Roth

CuuDuongThanCong.com
2CS6501-NLP https://fb.com/tailieudientucntt
Cross-document co-reference resolution
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.

CuuDuongThanCong.com
3CS6501-NLP https://fb.com/tailieudientucntt
Reference resolution: (disambiguation to
Wikipedia)
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.

CS6501-NLP
CuuDuongThanCong.com
4 https://fb.com/tailieudientucntt
The “Reference” Collection has
Structure
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.

Is_a
Is_a

Used_In
Released

Succeeded

CS6501-NLP
CuuDuongThanCong.com
5 https://fb.com/tailieudientucntt
Analysis of Information Networks
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.

CS6501-NLP
CuuDuongThanCong.com
6 https://fb.com/tailieudientucntt
Wikipedia as a knowledge resource ….

Is_a
Is_a

Used_In
Released

Succeeded

CS6501-NLP
CuuDuongThanCong.com
7 https://fb.com/tailieudientucntt
Cycles of
Wikification: Knowledge:
Grounding
The Reference Problem for/using
Knowledge
Blumenthal (D) is a candidate for the U.S. Senate seat now held by
Christopher Dodd (D), and he has held a commanding lead in the race
since he entered it. But the Times report has the potential to
fundamentally reshape the contest in the Nutmeg State.

Blumenthal (D) is a candidate for the U.S. Senate seat now held by
Christopher Dodd (D), and he has held a commanding lead in the race
since he entered it. But the Times report has the potential to
fundamentally reshape the contest in the Nutmeg State.

CuuDuongThanCong.com
8CS6501-NLP https://fb.com/tailieudientucntt
Challenging
v Dealing with Ambiguity of Natural Language
v Mentions of entities and concepts could have multiple
meanings
v Dealing with Variability of Natural Language
v A given concept could be expressed in many ways

v Wikification addresses these two issues in a specific


way:

v The Reference Problem


v What is meant by this concept? (WSD + Grounding)
v More than just co-reference (within and across documents)

CuuDuongThanCong.com
9CS6501-NLP https://fb.com/tailieudientucntt
General Challenges

Blumenthal (D) is a candidate for the U.S. Senate seat now held by
Christopher Dodd (D), and he has held a commanding lead in the race
since he entered it. But the Times report has the potential to
fundamentally reshape the contest in the Nutmeg State.

• Ambiguity • Variability

The New York Times CT


Times The Nutmeg State Connecticut
The Times

• Concepts outside of • Scale


Wikipedia (NIL) • Millions of labels
• Blumenthal ?

CS6501-NLP 10
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Wikification: Subtasks
v Wikification and Entity Linking requires
addressing several sub-tasks:
v Identifying Target Mentions
v Mentions in the input text that should be Wikified
v Identifying Candidate Titles
v Candidate Wikipedia titles that could correspond to each
mention
v Candidate Title Ranking
v Rank the candidate titles for a given mention
v NIL Detection and Clustering
v Identify mentions that do not correspond to a Wikipedia title
v Entity Linking: cluster NIL mentions that represent the
same entity.

CS6501-NLP
CuuDuongThanCong.com
11 https://fb.com/tailieudientucntt
High-level Algorithmic Approach.
v Input: A text document d; Output: a set of pairs (mi ,ti)
v mi are mentions in d; tj(mi ) are corresponding Wikipedia titles, or
NIL.
v (1) Identify mentions mi in d
v (2) Local Inference
v For each mi in d:
v Identify a set of relevant titles T(mi )
v Rank titles ti ∈ T(mi )
[E.g., consider local statistics of edges [(mi ,ti) , (mi ,*), and (*, ti )]
occurrences in the Wikipedia graph]
v (3) Global Inference
v For each document d:
v Consider all mi ∈ d; and all ti ∈ T(mi )
v Re-rank titles ti ∈ T(mi )
[E.g., if m, m’ are related by virtue of being in d, their corresponding
titles t, t’ may also be related]

CS6501-NLP
CuuDuongThanCong.com
12 https://fb.com/tailieudientucntt
Local approach
A text Document

Identified
mentions

Wikipedia Articles

Local score of matching


§ Γ is a solution to the problem the mention to the title
§ A set of pairs (m,t) (decomposed by mi)
§ m: a mention in the document
§ t: the matched Wikipedia Title

CS6501-NLP
CuuDuongThanCong.com
13 https://fb.com/tailieudientucntt
Global Approach: Using Additional Structure

Text Document(s)—News, Blogs,…

Wikipedia Articles

Adding a “global” term to evaluate how


good the structure of the solution is.
• Use the local solutions Γ’ (each
mention considered independently.
• Evaluate the structure based on pair-
wise coherence scores Ψ(ti,tj)
• Choose those that satisfy document
coherence conditions.
CS6501-NLP
CuuDuongThanCong.com
14 https://fb.com/tailieudientucntt

You might also like