You are on page 1of 19

Computational Knowledge Analysis –

Natural Language Processsing with Python


Session 6
Information Extraction, Relation Classification, and Knowledge Graphs
12.06.2023
Dr. Maria Becker
Summer Term 2023
Areas of NLP: Syntactic vs. Semantic Analysis
• Syntax and semantic analysis are two main techniques used with
natural language processing:
• NLP uses syntax to assess meaning from a language based on grammatical
rules
• Syntax techniques include Morphological segmentation, POS tagging, chunking,
dependency parsing…

• Semantics involves the use of and meaning behind words. NLP applies
algorithms to understand the meaning and structure of sentences
• Semantics techniques include word sense disambiguation, named entity, sentiment
analysis, text summarization…
Introduction to Information Extraction

• Watch the Video by Chris Manning (9 Minutes): Introduction to


Information Extraction
• https://www.youtube.com/watch?v=kKaGLGAQrmw
What is Information Extraction?

• Information Extraction/ Information retrieval (mostly used as


synonyms): Automatic extraction of structured information such as
entities, relationships between entities, and attributes describing
entities from unstructured, possibly noisy sources
• Opened up new avenues for querying, organizing, and analyzing data
• Enables much richer forms of queries on the abundant unstructured
sources than possible with keyword searches alone
Information Extraction: From Unstructured to Structured Texts

From: Jurafsky/Martin (2021): Speech and Language Processing.


Why Information Extraction?
• Information Extraction enables
• finding entities
• classifying entities
• classifying relations between entities
• storing entities and their relations in a database (knowledge graphs)
• with the ultimate goal of making unstructured texts machine-readable
Applications of Information Extraction
• News Tracking: automatically tracking specific event types from news sources
• Customer Care: Any customer-oriented enterprise collects many forms of
unstructured data from customer interaction
• Personal information management (PIM) systems: seek to organize personal data
like documents, emails, projects and people in a structured inter-linked format
• Comparison Shopping: creating comparison shopping web sites that automatically
crawl merchant web sites to find products and their prices which can then be used
for comparison shopping
• Ad Placement on Webpages: advertisements of a product next to the text that
both mentions the product and expresses a positive opinion about it
• Scientific Applications: E.g. extracting biological objects such as proteins and genes
from paper repositories such as Pubmed
Subtask of Information Extraction
• IE can involve a couple of subtasks:
• Template filling
• Event extraction
• Table information extraction
• Terminology extraction
• Coreference resolution
• Named entity recognition
• Relationship extraction
Step 1: Extract all (important) entities from texts –
Named Entity Recognition
• Subtask of information extraction
• Goal: detect and classify named entities mentioned in unstructured text into pre-
defined categories
• Common Categories:
Step 2: Relation Classification and Knowledge Graphs

• Watch the Video (13 Minutes): Introduction to Relation Extraction


• https://www.youtube.com/watch?v=4AjieiJ1CXo
From Knowledge Relations to Knowledge Graphs
• Entities and the relations between them can be stored in knowledge graphs
• A knowledge graph (or semantic network) represents a network of entities
(i.e. objects, events, situations, or concepts) and illustrates the relationships
between them
• This information is usually stored in a graph database and visualized as a
graph structure, prompting the term knowledge “graph”
• A knowledge graph is made up of three main components: nodes, edges,
and labels
• Any object, place, or person can be a node
• An edge defines the relationship between the nodes
• Knowledge graphs are very useful for a lot of NLP downstream tasks such as
automatic question answering
Example of a Knowledge Graph: ConceptNet

• Semantic network containing common


sense knowledge
• Diverse and simple facts about the world,
people and everyday life

• Collected from volunteers on the


Internet
• Nodes represent words/phrases, edges
represent relations
• Triples: ⟨left term, relation, right term⟩

• English version (5.6):


• set of 37 relations
• about 1,900,000 nodes
ConceptNet Relations

14
ConceptNet Relations & Examples
ExternalURL knowledge → dbpedia.org HasProperty ice → cold
FormOf slept → sleep MotivatedByGoal compete → win
IsA car → vehicle; Chicago → city ObstructedBy sleep → noise
PartOf gearshift → car Desires person → love
HasA bird → wing; pen → ink CreatedBy cake → bake
UsedFor bridge → cross water Synonym sunlight ↔ sunshine
CapableOf knife → cut Antonym black ↔ white; hot ↔ cold
AtLocation Boston → Massachusetts DerivedFrom pocketbook → book
Causes exercise → sweat SymbolOf red → fervor
HasSubevent eating → chewing DefinedAs peace → absence of war
HasFirstSubevent sleep → close eyes Entails run → move
HasLastSubevent cook → clean up kitchen MannerOf auction → sale
HasPrerequisite dream → sleep LocatedNear chair ↔ table
Use case scenario: Commonsense knowledge graphs are
helpful for argument relation classification (Paul et al., 2020)
Injecting Knowledge Relations into a
Neural Argument Classifier

Knowledge paths from ConceptNet


Examples
Next session
• Next session (19.06.2023) will take place in Darmstadt
• The session will include some exercises, so please bring your laptops
• This week there is no homework

You might also like