You are on page 1of 49

Natural Language Processing

& Applications
Why Text ?

Source: www.pinterest.com
2
Source : RECOMND

3
Source: lifeboat.com
4
Natural Language Processing

• A hallmark of human intelligence.


• Natural Language Processing
• NLP = Natural Language Understanding + Natural Language Generation
• Process information contained in natural language text.
• Also known as Computational Linguistics (CL), Human Language Technology
(HLT), Natural Language Engineering (NLE)
• Can machines understand human language?
Ultimate goal
Analyze, understand and generate human languages just like humans do.

5
Fitting in CS taxonomy
Computers

Databases Algorithms Networking


Artificial Intelligence

Robotics Natural Language Processing Search

6
NLP- Tasks

Natural Language Understanding


Taking some spoken/typed sentence and working out what it means

Natural Language Generation


Taking some formal representation of what you want to say and working out a
way to express it in a natural (human) language (e.g., English)

7
Working towards

• Applying computational techniques to language domain.

• Use the theories to build systems that can be of social use.

• Make computers learn our language rather than we learn theirs.

8
Natural language understanding
Raw speech signal /Raw Text
• Speech recognition
Sequence of words spoken /written
• Syntactic analysis
Structure of the sentence
• Semantic analysis
Partial representation of meaning of sentence
• Discourse & Pragmatic analysis
Final representation of meaning of sentence

9
Need for Language Technologies – In Daily life

A computer could be used for:


• Answering the phone, and replying to a question
• Translating a daily newspaper.
• Read the whole newspaper and tell me the important news only
• Automatically generating movie subtitles
• Sentence corrections
• Correcting descriptive questions.
• Understanding text in journals / books and building an expert system.

10
Application Areas
➢Machine Translation
➢Information Retrieval
Selecting from a set of documents the ones that are relevant to a query
➢Text Categorization
Classifying text into fixed topic categories
➢Question Answering
➢ Information Extraction
Converting unstructured text into structured data

11
Application Areas (cont..)
➢Spoken language control systems
➢Spelling and grammar checkers
➢Sentiment Analysis
➢Text-to-Speech & Speech recognition
➢Natural Language Dialogue Interfaces to Databases

12
Question Answering

Source: Google
13
Information Retrieval

• NLP improves web search

• Search for ‘Jaguar’


• Search for ‘Apple’
• Search for notebook, find Laptop

Source: Google
14
Email Spam Filtering/Categorizing

Source : junkemailfilter.com

15
Text Categorization
• Assign Label to a document representing its content (ACM keyword, Yahoo
category)
• E.g. Decide if a newspaper article is about politics, business, or sports?

16
Source: Medium
Machine Translation
• Multilingual Usage
• Machine-assisted human Translation
• Scope
Creating Language resources.

Source: www.localizer.co

17
Source: Google

18
Duplicate Question detection

19
Knowledge Extraction

Source: http://aritter.github.io

20
Information Extraction
Information extraction systems
• Find and understand relevant parts of text.
• Produce a structured representation of the relevant information
from text, in the form of :
• entities,
• relations between entities ,
• events in which the entities are involved.
• Produce a structured representation of the relevant information-
relations/events

21
Information Extraction

Source : cs.washington.edu 22
Applications of IE Systems

• Extracting diagnoses, symptoms, physical findings, test results from Medical


patient records.
• Gathering earnings, profits, board members, etc. from company reports
• Automatic Verification of construction industry specifications documents
• Real estate advertisements
• Building job databases from textual job vacancy postings
• Extraction of company take-over events
• Categorizing customer feedbacks based on product names
• Location extraction from social media texts for security applications.

23
Semantic Web

• Linked Data
• Vocabularies / Domain Information
• Inference
• Query

Source :Google

24
TOOLS
• Apache OpenNLP : Java machine learning toolkit for natural language
processing
• OpenCalais : Tag the people, places, companies, facts, and events in your
content to increase its value, accessibility and interoperability
• DBpedia Spotlight : Tool for automatically annotating mentions of DBpedia
resources in text.
• Natural Language Toolkit is a suite of libraries and programs for NLP
• General Architecture for Text Engineering (GATE)
• Spacy is a free open-source library featuring state-of-the-art speed and
accuracy and a powerful Python API.
• Stanford CoreNLP:a Java annotation pipeline framework, which provides
most of the common core natural language processing (NLP) steps, from
tokenization through to coreference resolution.
25
Aspects of Language Processing
• Phonology
• Word, lexicon: lexical analysis
• Morphology, word segmentation
• Syntax
• Sentence structure, phrase, grammar, …
• Semantics
• Meaning
• Discourse analysis
• Meaning of a text
• Relationship between sentences
• Pragmatics
The study of meaning in different contexts of use
26
Phonology
Speech processing
• Humans process speech remarkably well.
• Speech interface can replace keyboards and monitors.
• Convert Acoustic signals to Text.
• Phonemes are the smallest recognizable speech unit in a language.

Grapheme
A way of writing
down a phoneme

Speech Recognition, Text to Speech Conversion 27


Lexical Analysis -Morphology

Delegate
(de + leg + ate)
Take the legs from

cashier
(cashy + er)
More wealthy

Source: www.pinterest.co.uk
28
Morphology
• Structures and patterns in words
• Words are a sequence of Morphemes.
• Morpheme – smallest meaningful unit in a word.
• Analyses how words are formed from morphemes.
e.g., dogs= dog+s.
• Inflectional Morphology – Same Part of Speech
• Buses = Bus + es
• Carried = Carry + ed
• Derivational Morphology – Change PoS.
• Destruct + ion = Destruction (Noun)
• Beauty + ful = Beautiful (Adjective)
• Affixes – Prefixes, Suffixes Rules govern the fusion.

Spell checkers, Lemmatization, Information retrieval


29
Syntax
• Words when put together they convey more.
• Syntax is the grammatical structure of the
sentence.
• Syntactic Analysis (Parsing)
Process of assigning a parse tree to a
sentence.
Parsing: Given a sentence and a grammar
• Checks that the sentence is correct
according to the grammar
• Returns a parse tree representing the
structure of the sentence

Grammar checking tools, Information Extraction, Phrase Identification 30


Syntactic Analysis - Grammar
sentence -> noun_phrase, verb_phrase
noun_phrase -> proper_noun
noun_phrase -> determiner, noun
verb_phrase -> verb, noun_phrase

proper_noun -> [mary]


noun -> [apple]
verb -> [ate]
determiner -> [the]

31
Parsing
• Analyze the structure of a sentence

NP VP

PP
NP NP

D N V D N P D N
The student put the book on the table

32
Semantic Analysis
• What do you mean..?
• Words – Lexical Semantics
• Sentences – Compositional Semantics
• Converting the syntactic structures to semantic format – meaning
representation.
• Semantics: the meaning of a word or phrase within a sentence

Event Extraction, Knowledgebase construction


33
Semantic Representations
• Meaning representation of the sentence from its syntactic structure(s)
• Ways of meaning representing the sentence:
• Logical forms
Sentence: A tall man plays basketball
Representation: x man(x) & tall(x) & plays(x, basketball)
• Semantic role labelling
Sentence: 3 people were killed as X fired with gun.
Representation: Kill( Agent : X, Victim: 3 people, Instrument : Gun)

34
Discourse Analysis
• The meaning of an individual sentence may depend on the sentences that
precede it and may influence the meaning of the sentence that follow it.
• Issues related to discourse Integration
• Anaphora
Resolving the pronoun’s reference. Co-reference resolution
• Ellipsis
Incomplete sentences

• Anaphora
• I read the book by Dr. Kalam. It was great

• He hits the car with a stone. It bounces back.


35
Anaphora Resolution
• Anaphora Resolution(AR) is the process of
determining the antecedent of an anaphor.
• Anaphor – The reference that points to the
previous item (he, it)
• Antecedent –The entity to which anaphor
refers (John, Ice-cream)
• Mary bought a book for Kelly. She didn’t like it.
• She refers to Mary or Kelly??
• It refers to what -- book

36
Discourse Structures- Ellipsis

• Ellipsis – Incomplete sentences


• “What’s your name?”
• “Sri, and yours?”

The second sentence is not complete, but what it means can be inferred
from the first one.

37
Pragmatics
• Uses context of utterance

• Where, by who, to whom, why, when it was


said

• Intentions: inform, request, promise,


criticize, …

38
Challenges in NLP: Ambiguity
Morphology

• Words with Different Part of speech(POS)


Ex. Issue ( I have an issue/ Please issue a ticket)

• Words with different meaning with same POS


Ex. Bank (River Bank, Indian Bank)

39
Syntax Ambiguity

S S

VP VP
NP NP
NP NP

N N V N N V Adj N
Teacher strikes idle kids Teacher strikes idle kids

40
Attachment Ambiguity

• A sentence has attachment ambiguity if a constituent fits more than one


position in a parse tree.

• Attachment ambiguity arises from uncertainty of attaching a phrase or clause to


a part of a sentence

• “ John saw Mary with a telescope”


• John saw (Mary with a telescope)
• John (saw Mary with a telescope)

41
Semantic Ambiguity
• Meaning of the words themselves can be misinterpreted.
• Example 1: The car hit the pole while it was moving.
• The interpretations can be
• The car, while moving, hit the pole
• The car hit the pole while the pole was moving.
• Example 2:

42
Semantic Ambiguity
Semantic ambiguity: “I saw the prudential building flying into Boston”

Semantic Restriction, Domain Knowledge - Ontology

43
Sample Ontology

44
Discourse Ambiguity

“We gave the monkeys the bananas because they were hungry”

“We gave the monkeys the bananas because they were over-ripe”

45
Pragmatics Ambiguity

Pragmatic ambiguity: “you’re late”

What’s the speaker’s intention: informing or criticizing?

46
Enabling Computing Techniques
• Stemming
• Reduce words to base form.
• Part of Speech Tagging
• Determine for each word whether it is a noun, adjective, verb, …..
• Parsing
• sentence to parse tree
• Wordnet – Lexical Database - 206941 word sense pairs
• Word Sense Disambiguation
• Bank (Financial Bank vs Riverbank)
• Semantic similarity metrics
• Vector Representations of Words, Sentences
• Neural Network based Models
• Word2Vec, Glove, Elmo.
• Pretrained models
• BERT etc.
• Large Language Models
• GPT, Llama etc.
47
Conclusion

• Complete human-level natural language understanding is still a


distant goal
• Develop Algorithms for each level.
• Find appropriate match between application domain and the
available methods

48
References
Books
• Dan Jurafsky and James H. Martin, Speech and Language Processing , Pearson
education

49

You might also like