Professional Documents
Culture Documents
Cog Comp - Module 2
Cog Comp - Module 2
1. In this phase, the text is broken down into paragraphs, sentences and words.
Analysis is done for identification and description of the structure of words. It
includes techniques as follows:
Stop word removal (removing ‘and’, ‘of’, ‘the’ etc. from text)
Tokenization (breaking the text into sentences or words)
Word tokenizer
Sentence tokenizer
Stemming (removing ‘ing’, ‘es’, ‘s’ from the tail of the words)
Lemmatization (converting the words to their base forms)
2. Syntactic Analysis
Syntactic Analysis is used to check grammar, word arrangements, and
shows the relationship among the words.
The boy goes to school
The school goes to boy
3. Semantic Analysis
Consider the sentence: “The apple ate a banana”. Although the sentence
is syntactically correct, it doesn’t make sense because apples can’t eat.
Semantic analysis looks for meaning in the given sentence. It also deals
with combining words into phrases.
4. Discourse Integration
The meaning of any sentence is determined by the meaning of the sentence
immediately preceding it. In addition, it establishes the meaning of the sentence that
follows. The sentences that come before it play a role in discourse integration. That
is to say, that statement or word is dependent on the preceding sentence or words.
It’s the same with the use of proper nouns and pronouns.
For example, Billy bought it.
The word “it” in the above sentence is dependent on the preceding discourse context
Example: "John got ready at 9 AM. Later he took the train to California"
5. Pragmatic Analysis
It is a complex phase where machines should have knowledge not only
about the provided text but also about the real world. There can be
multiple scenarios where the intent of a sentence can be misunderstood
if the machine doesn’t have real world knowledge.
Example:
"Thank you for coming so late, we have wrapped up the meeting"
(Contains sarcasm)
"Can you share your screen?" (here the context is about computer’s
screen share during a remote meeting)
Ambiguity and Uncertainty in Language
Lexical Ambiguity
The ambiguity of a single word is called lexical ambiguity.
Semantic Ambiguity
Semantic Ambiguity
This kind of ambiguity occurs when the meaning of the words
themselves can be misinterpreted. In other words, semantic ambiguity
happens when a sentence contains an ambiguous word or phrase.
“The car hit the truck while it was moving”
is having semantic ambiguity because the interpretations can be
“The car, while moving, hit the truck” and
“The car hit the truck while the truck was moving”.
Anaphoric Ambiguity
It is built on the idea that words used in a text are related to one another, and that this relationship
can be seen in the definitions of the words and their meanings.
The pair of dictionary senses having the highest word overlap in their dictionary meanings are
used to disambiguate two (or more) words.
Lesk algorithm assumes that words in a given “neighborhood” (a portion of text) will have a
similar theme.
The dictionary definition of an uncertain word is compared to the terms in its neighborhood in a
simplified version of the Lesk algorithm.
Resources required
1. Raw corpora
2. Machine readable Dictionary : indowordnet
Example
The bank can guarantee that deposit will eventually cover future
tuition costs because it invests in adjustable rate mortgage.
The bank can guarantee that deposit will eventually cover future tuition costs because it invests in adjustable
rate mortgage.
1. Sense bag
2. Context bag
Create 1 sense bag for each meaning of the word exists in the predefined dictionary
Creation of Sense Bag
SENSE BAG 1 SENSE BAG 2
depository financial
bank (sloping land
institution, bank, banking
(especially the slope beside
concern, banking company (a
a body of water)) "they
financial institution that
pulled the canoe up on the
accepts deposits and channels
bank"; "he sat on the bank
the money into lending
of the river and watched
activities) "he cashed a check
the currents"
at the bank"; "that bank
holds the mortgage on my
home"
Creation of Context Bag
The can
guarantee
that deposit
will eventually
cover future tuition costs
because it
invests in adjustable
rate mortgage
Compare context bag with sense bag
The can
SENSE BAG 1 SENSE BAG 2
guarantee
that deposit
bank (sloping land depository financial
(especially the slope beside institution, bank, banking will eventually
a body of water)) "they concern, banking company (a cover future
pulled the canoe up on the financial institution that
bank"; "he sat on the bank
tuition costs
of the river and watched accepts deposits and because it
the currents" channels the money into
lending activities) "he cashed invests in
a check at the bank"; "that adjustable
bank holds the mortgage rate mortgage
on my home"
What we have discussed till now is
LESK ALGORITHM
Hidden Markov Model
A Markov chain is a stochastic model that uses mathematics to predict the probability of a sequence of events
occurring based on the most recent event. A common example of a Markov chain in action is the way Google
predicts the next word in your sentence based on your previous entry within Gmail.
Application of NLP to business problems
1. Enhancing the shopping experience
2. Leveraging the connected world of internet of things
3. Voice of the customer
4. Fraud detection
Representing Knowledge
in
Taxonomies
and
Ontologies
What is knowledge?
Knowledge is what we know. Knowledge is unique to each individual and is the accumulation of
past experience and insight that shapes the lens by which we interpret, and assign meaning to,
information.
If a system cannot use data to improve its own performance without
reprogramming, it isn’t considered to be a cognitive system.
But to do that, there must be a wealth of data available at the heart of the
environment, formats for representing the knowledge contained within that
data, and a process for assimilating new knowledge.
KR MODEL
Semantic
Taxonomies Ontologies
Web
Taxonomies
In natural language processing (NLP), taxonomy refers to the practice of
categorizing natural language expressions, such as words or sentences, into
hierarchical or non-hierarchical categories based on their characteristics or
properties. The resulting classification system is called a taxonomy.
an example of a topic taxonomy for news articles:
Ontologies are semantic data models that define the types of things that exist in our domain and
the properties that can be used to describe them.
Ontologies are generalized data models, meaning that they only model general types of things that
share certain properties, but don’t include information about specific individuals in our domain.
For example, instead of describing your dog, Spot, and all of his individual characteristics, an
ontology should focus on the general concept of dogs, trying to capture characteristics that
most/many dogs might have.
Doing this allows us to reuse the ontology to describe additional dogs in the future.
There are three main components to an ontology, which are usually described as follows:
Some of these properties are relationships that connect two of our classes. For example,
the property “books have authors” is a relationship that connects our book class and
our author class. Other properties, such as “books are published on a date,” are
attributes, describing only one class, instead of connecting two classes together.
While the above list of properties is easy to read, it can be helpful to rewrite these properties to more clearly identify
our classes and properties. For example,
“books have authors” can be written as:
Remember that our ontology is a general data model, meaning that we don’t want to include information
about specific books in our ontology. Instead, we want to create a reusable framework we could use to describe
additional books in the future.
ONTOLOGY + DATA= KNOWLEDGE GRAPH
Semantic WEB
The Semantic Web is a mesh of data that are associated in such a way
that they can easily be processed by machines instead of human
operators. It can be conceived as an extended version of the existing
World Wide Web, and it represents an effective means of data
representation in the form of a globally linked database.