Professional Documents
Culture Documents
Xu-Ly-Ngon-Ngu-Tu-Nhien - Kai-Wei-Chang - 25-Ner - (Cuuduongthancong - Com)
Xu-Ly-Ngon-Ngu-Tu-Nhien - Kai-Wei-Chang - 25-Ner - (Cuuduongthancong - Com)
Kai-Wei Chang
CS @ University of Virginia
kw@kwchang.net
CS6501-NLP 1
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Organizing knowledge
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.
CuuDuongThanCong.com
2CS6501-NLP https://fb.com/tailieudientucntt
Cross-document co-reference resolution
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.
CuuDuongThanCong.com
3CS6501-NLP https://fb.com/tailieudientucntt
Reference resolution: (disambiguation to
Wikipedia)
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.
CS6501-NLP
CuuDuongThanCong.com
4 https://fb.com/tailieudientucntt
The “Reference” Collection has
Structure
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.
Is_a
Is_a
Used_In
Released
Succeeded
CS6501-NLP
CuuDuongThanCong.com
5 https://fb.com/tailieudientucntt
Analysis of Information Networks
It’s a version of Chicago – the Chicago was used by default Chicago VIII was one of the
standard classic Macintosh for Mac menus through early 70s-era Chicago
menu font, with that distinctive MacOS 7.6, and OS 8 was albums to catch my
thick diagonal in the ”N”. released mid-1997.. ear, along with Chicago II.
CS6501-NLP
CuuDuongThanCong.com
6 https://fb.com/tailieudientucntt
Wikipedia as a knowledge resource ….
Is_a
Is_a
Used_In
Released
Succeeded
CS6501-NLP
CuuDuongThanCong.com
7 https://fb.com/tailieudientucntt
Cycles of
Wikification: Knowledge:
Grounding
The Reference Problem for/using
Knowledge
Blumenthal (D) is a candidate for the U.S. Senate seat now held by
Christopher Dodd (D), and he has held a commanding lead in the race
since he entered it. But the Times report has the potential to
fundamentally reshape the contest in the Nutmeg State.
Blumenthal (D) is a candidate for the U.S. Senate seat now held by
Christopher Dodd (D), and he has held a commanding lead in the race
since he entered it. But the Times report has the potential to
fundamentally reshape the contest in the Nutmeg State.
CuuDuongThanCong.com
8CS6501-NLP https://fb.com/tailieudientucntt
Challenging
v Dealing with Ambiguity of Natural Language
v Mentions of entities and concepts could have multiple
meanings
v Dealing with Variability of Natural Language
v A given concept could be expressed in many ways
CuuDuongThanCong.com
9CS6501-NLP https://fb.com/tailieudientucntt
General Challenges
Blumenthal (D) is a candidate for the U.S. Senate seat now held by
Christopher Dodd (D), and he has held a commanding lead in the race
since he entered it. But the Times report has the potential to
fundamentally reshape the contest in the Nutmeg State.
• Ambiguity • Variability
CS6501-NLP 10
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Wikification: Subtasks
v Wikification and Entity Linking requires
addressing several sub-tasks:
v Identifying Target Mentions
v Mentions in the input text that should be Wikified
v Identifying Candidate Titles
v Candidate Wikipedia titles that could correspond to each
mention
v Candidate Title Ranking
v Rank the candidate titles for a given mention
v NIL Detection and Clustering
v Identify mentions that do not correspond to a Wikipedia title
v Entity Linking: cluster NIL mentions that represent the
same entity.
CS6501-NLP
CuuDuongThanCong.com
11 https://fb.com/tailieudientucntt
High-level Algorithmic Approach.
v Input: A text document d; Output: a set of pairs (mi ,ti)
v mi are mentions in d; tj(mi ) are corresponding Wikipedia titles, or
NIL.
v (1) Identify mentions mi in d
v (2) Local Inference
v For each mi in d:
v Identify a set of relevant titles T(mi )
v Rank titles ti ∈ T(mi )
[E.g., consider local statistics of edges [(mi ,ti) , (mi ,*), and (*, ti )]
occurrences in the Wikipedia graph]
v (3) Global Inference
v For each document d:
v Consider all mi ∈ d; and all ti ∈ T(mi )
v Re-rank titles ti ∈ T(mi )
[E.g., if m, m’ are related by virtue of being in d, their corresponding
titles t, t’ may also be related]
CS6501-NLP
CuuDuongThanCong.com
12 https://fb.com/tailieudientucntt
Local approach
A text Document
Identified
mentions
Wikipedia Articles
CS6501-NLP
CuuDuongThanCong.com
13 https://fb.com/tailieudientucntt
Global Approach: Using Additional Structure
Wikipedia Articles