You are on page 1of 69

MODULE 2

• 80 percent of the data is unstructured.


• To make good decision it is necessary to understand and analyzed that
unstructured data.
• Unlike structured data, unstructured information must be parsed and tagged to
find the elements of meaning.
• Natural language processing technique interpret the relationship between massive
amount of natural language elements.
• How to use NLP techniques to support continuous learning lifecycle.
Role of NLP in cognitive system
Natural language processing (NLP) is the ability of a computer program to understand human
language as it is spoken and written -- referred to as natural language.
The importance of context

What we have discussed in the last class


Context
1. Context in NLP is the setting or situation in which the content occurs.

2. We humans are capable to find patterns and make associations


between words to determine meaning and understand sentiment. Not
so easy for machines

3. There is a great deal of ambiguity in language and many words may


have multiple meaning depending on the subject matter being
discussed or how one word in combined with other words.
Understanding
Linguistics
• Linguistics is the scientific study of
language, and its focus is the systematic
investigation of the properties of
particular languages as well as the
characteristics of language in general.
Subfields of linguistic are:
1. Phonetics & Phonology - the study of
how speech sounds are produced and
perceived (butter pronunciation
demonstration
*A person who is angry may use the same
words who is confused.

2. Morphology - the study of word structure


Unavoidable
Biscuit
Blood pressure

3. Syntax - the study of sentence structure


( ‘to arrange together.’)
Syntax looks at the rules and process of
building a sentence and it looks at the word
order and structure of a sentence.  The
meaning of a sentence in any language
depends on the syntax and order of the
words.
The dog chase the cat
Semantics - Semantic analysis is the process
of understanding the meaning and
interpretation of words, signs and sentence
structure.
The squirrel sang a song

Pragmatics - the study of how language is


used in context (correct meaning at correct
situation)
Do you know what time is it?
one context of this might be literally asking
the time
Another might be an Indian mom yelling at
son to wake him up, And definitely, she is
not asking you time there..
• NLP encompasses both NLG
and NLU, which have the
following distinct, but related
capabilities:
• NLU refers to the ability of a
computer to use syntactic
and semantic analysis to
determine the meaning of text
or speech.
• NLG enables computing
devices to generate text and
speech from data input.
1. Lexical Analysis/ Morphological Analysis

1. In this phase, the text is broken down into paragraphs, sentences and words.
Analysis is done for identification and description of the structure of words. It
includes techniques as follows:

Stop word removal (removing ‘and’, ‘of’, ‘the’ etc. from text)
Tokenization (breaking the text into sentences or words)
Word tokenizer
Sentence tokenizer
Stemming (removing ‘ing’, ‘es’, ‘s’ from the tail of the words)
Lemmatization (converting the words to their base forms)
2. Syntactic Analysis
Syntactic Analysis is used to check grammar, word arrangements, and
shows the relationship among the words.
The boy goes to school
The school goes to boy
3. Semantic Analysis
Consider the sentence: “The apple ate a banana”. Although the sentence
is syntactically correct, it doesn’t make sense because apples can’t eat.
Semantic analysis looks for meaning in the given sentence. It also deals
with combining words into phrases.
4. Discourse Integration
The meaning of any sentence is determined by the meaning of the sentence
immediately preceding it. In addition, it establishes the meaning of the sentence that
follows. The sentences that come before it play a role in discourse integration. That
is to say, that statement or word is dependent on the preceding sentence or words.
It’s the same with the use of proper nouns and pronouns.
For example, Billy bought it.
The word “it” in the above sentence is dependent on the preceding discourse context

Example: "John got ready at 9 AM. Later he took the train to California"
5. Pragmatic Analysis
It is a complex phase where machines should have knowledge not only
about the provided text but also about the real world. There can be
multiple scenarios where the intent of a sentence can be misunderstood
if the machine doesn’t have real world knowledge.

Example:
"Thank you for coming so late, we have wrapped up the meeting"
(Contains sarcasm)
"Can you share your screen?" (here the context is about computer’s
screen share during a remote meeting)
Ambiguity and Uncertainty in Language

Lexical Ambiguity
The ambiguity of a single word is called lexical ambiguity.
Semantic Ambiguity
Semantic Ambiguity
This kind of ambiguity occurs when the meaning of the words
themselves can be misinterpreted. In other words, semantic ambiguity
happens when a sentence contains an ambiguous word or phrase.
“The car hit the truck while it was moving”
is having semantic ambiguity because the interpretations can be
“The car, while moving, hit the truck” and
“The car hit the truck while the truck was moving”.
Anaphoric Ambiguity

This kind of ambiguity arises due to the use of anaphora entities in


discourse. For example,
the horse ran up the hill. It was very steep. It soon got tired.

Here, the anaphoric reference of “it” in two situations cause ambiguity.


Pragmatic ambiguity

Such kind of ambiguity refers to the situation where the context of a


phrase gives it multiple interpretations. In simple words, we can say that
pragmatic ambiguity arises when the statement is not specific.
For example, the sentence

“I like you too” can have multiple interpretations like


I like you (just like you like me),
I like you (just like someone else does).
Handling Ambiguity: WSD
WSD: Word Sense Disambiguation

• Word Sense Disambiguation is an important method of NLP by which the


meaning of a word is determined, which is used in a particular context. Word
Sense Disambiguation basically solves the ambiguity that arises in determining
the meaning of the same word used in different situations.

• He killed him with a baseball bat


bat : mammal or wooden object
Approaches
Dictionary based or
Knowledge Based
approach
Dictionary based or Knowledge Based approach

 It is built on the idea that words used in a text are related to one another, and that this relationship
can be seen in the definitions of the words and their meanings.

 The pair of dictionary senses having the highest word overlap in their dictionary meanings are
used to disambiguate two (or more) words.

  Lesk Algorithm is the classical algorithm based on Knowledge-Based WSD.

 Lesk algorithm assumes that words in a given “neighborhood” (a portion of text) will have a
similar theme.

 The dictionary definition of an uncertain word is compared to the terms in its neighborhood in a
simplified version of the Lesk algorithm. 
Resources required
1. Raw corpora
2. Machine readable Dictionary : indowordnet

Example
The bank can guarantee that deposit will eventually cover future
tuition costs because it invests in adjustable rate mortgage.
The bank can guarantee that deposit will eventually cover future tuition costs because it invests in adjustable
rate mortgage.

bank: financial bank or river bank

Knowledge based approach has two bags

1. Sense bag
2. Context bag

Create 1 sense bag for each meaning of the word exists in the predefined dictionary
Creation of Sense Bag
SENSE BAG 1 SENSE BAG 2

depository financial
bank (sloping land
institution, bank, banking
(especially the slope beside
concern, banking company (a
a body of water)) "they
financial institution that
pulled the canoe up on the
accepts deposits and channels
bank"; "he sat on the bank
the money into lending
of the river and watched
activities) "he cashed a check
the currents"
at the bank"; "that bank
holds the mortgage on my
home"
Creation of Context Bag

The can
guarantee
that deposit
will eventually
cover future tuition costs
because it
invests in adjustable
rate mortgage
Compare context bag with sense bag
The can
SENSE BAG 1 SENSE BAG 2
guarantee
that deposit
bank (sloping land depository financial
(especially the slope beside institution, bank, banking will eventually
a body of water)) "they concern, banking company (a cover future
pulled the canoe up on the financial institution that
bank"; "he sat on the bank
tuition costs
of the river and watched accepts deposits and because it
the currents" channels the money into
lending activities) "he cashed invests in
a check at the bank"; "that adjustable
bank holds the mortgage rate mortgage
on my home"
What we have discussed till now is

LESK ALGORITHM
Hidden Markov Model

A Markov chain is a stochastic model that uses mathematics to predict the probability of a sequence of events
occurring based on the most recent event. A common example of a Markov chain in action is the way Google
predicts the next word in your sentence based on your previous entry within Gmail.
Application of NLP to business problems
1. Enhancing the shopping experience
2. Leveraging the connected world of internet of things
3. Voice of the customer
4. Fraud detection
Representing Knowledge
in
Taxonomies
and
Ontologies
What is knowledge?
Knowledge is what we know. Knowledge is unique to each individual and is the accumulation of
past experience and insight that shapes the lens by which we interpret, and assign meaning to,
information.

What is knowledge representation?


Knowledge representation is the process of creating a structured and organized model of knowledge
in a form that can be understood and processed by a computer. In other words, it is a way of
encoding knowledge and information in a machine-readable format, so that it can be used for various
computational tasks such as reasoning, problem-solving, and decision-making.
Learning from data is at the heart of cognitive computing.

If a system cannot use data to improve its own performance without
reprogramming, it isn’t considered to be a cognitive system.

But to do that, there must be a wealth of data available at the heart of the
environment, formats for representing the knowledge contained within that
data, and a process for assimilating new knowledge.

Here the role of knowledge representation came…….


Models for knowledge representation

KR MODEL

Semantic
Taxonomies Ontologies
Web
Taxonomies
In natural language processing (NLP), taxonomy refers to the practice of
categorizing natural language expressions, such as words or sentences, into
hierarchical or non-hierarchical categories based on their characteristics or
properties. The resulting classification system is called a taxonomy.
an example of a topic taxonomy for news articles:

World News Business News Entertainment Sports News


Europe Finance News Football
Middle East Technology Music Basketball
Africa Markets Film Tennis
Asia Television Other Sports
Americas

• Using this taxonomy, a news article about a tennis match would be


classified under "Sports News" and "Tennis". Similarly, an article
about a new technology product would be classified under "Business
News" and "Technology". This taxonomy can be used for a variety of
NLP tasks, such as text classification, topic modeling, and information
retrieval, to organize and categorize news articles for easier analysis
and retrieval.
Ontology

Ontologies are semantic data models that define the types of things that exist in our domain and
the properties that can be used to describe them.

Ontologies are generalized data models, meaning that they only model general types of things that
share certain properties, but don’t include information about specific individuals in our domain.
For example, instead of describing your dog, Spot, and all of his individual characteristics, an
ontology should focus on the general concept of dogs, trying to capture characteristics that
most/many dogs might have.

Doing this allows us to reuse the ontology to describe additional dogs in the future.

There are three main components to an ontology, which are usually described as follows:

Classes: the distinct types of things that exist in our data.


Relationships: properties that connect two classes.
Attributes: properties that describe an individual class. 
four classes for this example:
• Books
• Authors
• Publishers
• Locations

identify relationships and attributes for book class, some properties might be:


• Books have authors
• Books have publishers
• Books are published on a date
• Books are followed by sequels (other books)

Some of these properties are relationships that connect two of our classes. For example,
the property “books have authors” is a relationship that connects our book class and
our author class. Other properties, such as “books are published on a date,” are
attributes, describing only one class, instead of connecting two classes together. 
While the above list of properties is easy to read, it can be helpful to rewrite these properties to more clearly identify
our classes and properties. For example,
“books have authors” can be written as:

• Book → has author → Author 


Although there are many more properties that you could include, depending on your use case, for this blog, I’ve
identified the following properties:

Book → has author → Author


Book → has publisher→ Publisher
Book → published on → Publication date
Book → is followed by → Book
Author → works with → Publisher
Publisher → located in → Location 
Location → located in → Location

Remember that our ontology is a general data model, meaning that we don’t want to include information
about specific books in our ontology. Instead, we want to create a reusable framework we could use to describe
additional books in the future.
ONTOLOGY + DATA= KNOWLEDGE GRAPH
Semantic WEB
The Semantic Web is a mesh of data that are associated in such a way
that they can easily be processed by machines instead of human
operators. It can be conceived as an extended version of the existing
World Wide Web, and it represents an effective means of data
representation in the form of a globally linked database.

You might also like