You are on page 1of 25

Subject:- Natural Language Processing Solution Time: 03.

00 Hours
Max. Marks: 80 Date: 19/11 /2022
N.B 1. Q.1 is compulsory Subject Code: CSDC7013
2. Attempt any three from the remaining four questions.
3. Each Question carries 20 marks.
Q.1. Attempt All Marks
What is WordNet? What is WordNet used for?
WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are
grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.
Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting
network of meaningfully related words and concepts can be navigated with the WordNet
browser. WordNet is also freely and publicly available for download. WordNet's structure
makes it a useful tool for computational linguistics and natural language processing.
WordNet superficially resembles a thesaurus, in that it groups words together based on their
meaning. WordNet interlinks not just word forms—strings of letters—but specific senses of
words. As a result, words that are found in close proximity to one others in the network are
semantically disambiguated. WordNet labels the semantic relations among words, whereas
the groupings of words in a thesaurus does not follow any explicit pattern other than
meaning similarity. WordNet's latest online-version 3.1 database contains 1,55,327 words
organized in 1,75,979 synsets for a total of 2,07,016 word-sense pairs; in compressed form,
it is about 12 megabytes in size. WordNet consists of three separate databases, one for nouns,
a) one for verbs and one for adjectives and adverbs. It does not include closed class words. 5
WordNet's structure makes it a useful tool for many tasks in computational linguistics and
natural language processing:
As a lexical resource, an online dictionary. Word sense disambiguation. Information
retrieval. Automatic text/document classification. Machine translation. Automatic crossword
puzzle generation. Improve search engine results. Document retrieval
Concept Identification in Natural Language: WordNet can be used to identify concepts
pertaining to a term, to suit them to the full semantic richness and complexity of a given
information need.
Word Sense Disambiguation: WordNet combines features of a number of the other resources
commonly used in disambiguation work. It offers sense definitions of words, identifies
synsets of synonyms, defines a number of semantics relations and is freely available. This
makes it the (currently) best known and most utilized resource for word sense
disambiguation.

1
Automatic Query Expansion: WordNet semantic relations can be used to expand queries so
that the search for a document is not confined to the pattern-matching of query terms, but
also covers synonyms.
Document Summarization: WordNet has found a useful application in text summarization.
Few approaches utilize information from WordNet to compute lexical chains.

2
What do you mean by ambiguity in Natural Language? Explain with suitable examples.
Ambiguity can occur at all NLP levels. It is a property of linguistic expressions. If an
expression (word/phrase/sentence) has more than one interpretation we can refer it as
ambiguous. For eg: Consider the sentence,
“The chicken is ready to eat.”
The interpretations in the above phrase can be,
The chicken (bird) is ready to be feeder or The chicken (food) is ready to be eaten.
Consider another sentence,
“There was not a single man at the party.”
The interpretations in this case can be Lack of bachelors at the party or Lack of men
altogether.
Some interpretations of : I made her duck. I cooked duck for her. I cooked duck belonging
to her. I created a toy duck which she owns. I caused her to quickly lower her head or body.
I used magic and turned her into a duck.
● duck – morphologically and syntactically ambiguous: noun or verb.
● her – syntactically ambiguous: dative or possessive.
● make – semantically ambiguous: cook or create.
● make – syntactically ambiguous
b) 5
Lexical Ambiguity
The ambiguity of a single word is called lexical ambiguity. For example, treating the word
silver as a noun, an adjective, or a verb.
Syntactic Ambiguity
This kind of ambiguity occurs when a sentence is parsed in different ways. For example, the
sentence “The man saw the girl with the telescope”. It is ambiguous whether the man saw the
girl carrying a telescope or he saw her through his telescope.
Semantic Ambiguity
This kind of ambiguity occurs when the meaning of the words themselves can be
misinterpreted. In other words, semantic ambiguity happens when a sentence contains an
ambiguous word or phrase. For example, the sentence “The car hit the pole while it was
moving” is having semantic ambiguity because the interpretations can be “The car, while
moving, hit the pole” and “The car hit the pole while the pole was moving”.
Anaphoric Ambiguity
This kind of ambiguity arises due to the use of anaphora entities in discourse. For example,
the horse ran up the hill. It was very steep. It soon got tired. Here, the anaphoric reference of
“it” in two situations cause ambiguity.
Pragmatic ambiguity
Such kind of ambiguity refers to the situation where the context of a phrase gives it multiple
interpretations. In simple words, we can say that pragmatic ambiguity arises when the
3
statement is not specific. For example, the sentence “I like you too” can have multiple
interpretations like I like you (just like you like me), I like you (just like someone else dose).
What are morphemes? What is the difference between Content morphemes and
Function morphemes?
In natural languages, words are made up of meaningful subunits called morphemes.
Morphemes are abstract concepts denoting entities or relationships. Morphemes may be :
Stems: the main morpheme of the word
Affixes: convey the word’s role, number, gender, etc.
● cats == cat [stem] + s [suffix]
● undo == un [prefix] + do [stem]
Content morphemes:
Denote concepts such as objects, action, attributes and ideas. Carry the main meanings in
c) sentences. Thus (the stems of) nouns, verbs, adjectives are typically content morphemes: 5
"throw," "green," "Kim," and "sand" are all English content morphemes. Content morphemes
are also often called open-class morphemes, because they belong to categories that are open
to the invention of arbitrary new items.
Function morphemes:
Their role is largely grammatical. Have grammatical function in sentences. Do not carry the
main semantic content. prepositions ("to", "by"), articles ("the", "a"), pronouns ("she", "his"),
and conjunctions are typically function morphemes, since they either serve to tie elements
together grammatically ("hit by a truck," "Kim and Leslie," "Lee saw his dog"), or express
obligatory (in a given language!) morphological features like definiteness ("she found a
table" or "she found the table" but not "*she found table"). Function morphemes are also
called "closed-class" morphemes, because they belong to categories that are essentially
closed to invention or borrowing -- it is very difficult to add a new preposition, article or
pronoun.
Discuss in detail five aspects of Pragmatics?
Pragmatics deals with using and understanding sentences in different situations and how the
interpretation of the sentence is affected. The ability to understand another speaker's
intended meaning is called pragmatic competence. Pragmatics is different than semantics,
which concerns the relations between signs and the objects they signify. Semantics refers to
the specific meaning of language; pragmatics, by contrast, involves all of the other social
cues that accompany language
d) 5
Pragmatics focuses not on what people say but how they say it and how others interpret their
utterances in social contexts. Utterances are literally the units of sound you make when you
talk, but the signs that accompany those utterances are what give the sounds their true
meaning.
Deixis concerns the ways in which languages encode or grammaticalized features of the
context of utterance or speech event. Deixis is methods of directly encoding context into
language. Consider, for example, finding the following notice on someone's office door:

4
“l'Il be back in an hour”. Because we don't know when it was written, we cannot know when
the writer will return. In linguistics, deixis is the use of general words and phrases to refer to
a specific time, place, or person in context, e.g., the words tomorrow, there, and they.
Implicature: It means more being communicated than is said. Conversational implicature: a
meaning or message that is implicated in a conversation. When people oversay (or say more
of) or undersay (say less of) something, they produce certain extra meaning or meanings
beyond the literal meanings of words and sentences. This extra meaning is conversationally
dependent, hence conversation implicature. For example, if speaker A says ‘Has John
arrived?’, and speaker B responds ‘There is a blue car in the driveway’, one can infer, under
the appropriate circumstances and based on shared assumptions between the speakers, that
John has arrived.
Speech Acts: Speech act is an utterance defined in terms of a speaker's intention and the
effect it has on a listener. Essentially, it is the action that the speaker hopes to provoke in his
or her audience. Speech acts might be requests, warnings, promises, apologies, greetings or
any number of declarations. For example, the phrase "I would like the mashed potatoes,
could you please pass them to me?" is considered a speech act as it expresses the speaker's
desire to acquire the mashed potatoes, as well as presenting a request that someone pass the
potatoes to them.
Conversational Structure: analysis of the sequential (and anti-sequential) nature of
conversations, interruptions, etc
Presupposition is something the speaker assumes to be the case prior to making an utterance.
It roughly describes that which is immediately inferrable but not the new information in an
utterance. Examples of presuppositions include: Jane no longer writes fiction.
Presupposition: Jane once wrote fiction, Have you stopped eating burger? Presupposition:
you had once eaten burger , Have you talked to Hans? Presupposition: Hans exists.

Q.2. Attempt All


Discuss various stages involved in the NLP process with suitable examples.
Morphological and Lexical Analysis: The lexicon of a language is its vocabulary that
includes its words and expressions. Morphology depicts analyzing, identifying and
description of structure of words. Lexical analysis involves dividing a text into paragraphs,
words and the sentences. It is a study of the way words are built up from smaller
meaning-bearing units called morphemes. For example, the word ‘fox’ has single morpheme
while the word ‘cats’ have two morphemes ‘cat’ and morpheme ‘–s’ represents singular and
plural concepts. Morphological lexicon is the list of stem and affixes together with basic
a) information, whether the stem is a Noun stem or a Verb stem. 10
Syntactic Analysis: Syntax concerns the proper ordering of words and its affect on meaning .
This involves analysis of the words in a sentence to depict the grammatical structure of the
sentence. The words are transformed into structure that shows how the words are related to
each other. Eg. “the girl the go to the school”. This would definitely be rejected by the
English syntactic analyzer. E.g. “Ravi apple eats”. It is a study of formal relationships
between words. It is a study of: how words are clustered in classes in the form of Part-of
Speech (POS), how they are grouped with their neighbors into phrases, and the way words
depend on each other in a sentence.
5
Semantic Analysis: Semantics concerns the (literal) meaning of words, phrases, and
sentences. This abstracts the dictionary meaning or the exact meaning from context. The
structures which are created by the syntactic analyzer are assigned meaning. E.g.. “colorless
blue idea” .This would be rejected by the analyzer as colorless blue do not make any sense
together. E.g. “Stone eat apple”. It is a study of the meaning of words that are associated with
grammatical structure. It consists of two kinds of approaches: syntax-driven semantic
analysis and semantic grammar. The detailed explanation of this level is discussed in chapter
4. In discourse context, the level of NLP works with text longer than a sentence. There are
two types of discourse- anaphora resolution and discourse/text structure recognition.
Anaphora resolution is replacing of words such as pronouns. Discourse structure recognition
determines the function of sentences in the text which adds meaningful representation of the
text.
Discourse Integration: Sense of the context. The meaning of any single sentence depends
upon the sentences that precedes it and also invokes. the meaning of the sentences that
follow it. E.g. the word “it” in the sentence “she wanted it” depends upon the prior discourse
context. The term “discourse integration” refers to a feeling of context. The meaning of any
sentence is determined by the meaning of the sentence immediately preceding it. In addition,
it establishes the meaning of the sentence that follows. The sentences that come before it
play a role in discourse integration. That is to say, that statement or word is dependent on the
preceding sentence or words. It’s the same with the use of proper nouns and pronouns.
Pragmatic Analysis: Pragmatics concerns the overall communicative and social context and
its effect on interpretation. It means abstracting or deriving the purposeful use of the
language in situations. Importantly those aspects of language which require world
knowledge. The main focus is on what was said is reinterpreted on what it actually means.
E.g. “close the window?” should have been interpreted as a request rather than an order 1.
The overall communicative and social content, as well as its impact on interpretation, are the
focus of pragmatic analysis. Pragmatic Analysis uses a set of rules that describe cooperative
dialogues to help you find the intended result. It covers things like word repetition, who said
what to whom, and so on. It comprehends how people communicate with one another, the
context in which they converse, and a variety of other factors. It refers to the process of
abstracting or extracting the meaning of a situation’s use of language. It translates the given
text using the knowledge gathered in the preceding stages. “Switch on the TV” when used in
a sentence, is an order or request to switch the TV on.
Provide examples for each stage.
What do you mean by stemming? Explain Porter’s stemming algorithm in detail.
Stemming usually refers to a crude heuristic process that chops off the ends of words in the
hope of achieving this goal correctly most of the time, and often includes the removal of
derivational affixes. With stemming, words are reduced to their word stems. A word stem
b) need not be the same root as a dictionary-based morphological root, it just is equal to or 10
smaller form of the word. Stemming algorithms are typically rule-based. For example, we
may have a suffix rule that, based on a list of known suffixes, cuts them off.
In the English language, we have suffixes like “-ed” and “-ing” which may be useful to cut
off in order to map the words “cook,” “cooking,” and “cooked” all to the same stem of

6
“cook.” The most common algorithm for stemming English, and one that has repeatedly
been shown to be empirically very effective, is Porter's algorithm (Porter, 1980)

Particular domains may also require special stemming rules. However, the exact stemmed
form does not matter, only the equivalence classes it forms. Stemming increases recall while
harming precision. As an example of what can go wrong, note that the Porter stemmer stems
all of the following words:
operate operating operates operation Operative operatives operational to
------> oper.
Overstemming comes from when too much of a word is cut off. This can result in
nonsensical stems, where all the meaning of the word is lost or muddled. Or it can result in
words being resolved to the same stems, even though they probably should not be.
Take the four words university, universal, universities, and the universe. A stemming
algorithm that resolves these four words to the stem “univers” has overstemmed
Under Stemming is the opposite issue. It comes from when we have several words that
actually are forms of one another. It would be nice for them to all resolve to the same stem,
but unfortunately, they do not. This can be seen if we have a stemming algorithm that stems
the words data and datum to “dat” and “datu.”
The goal of stemming is to improve performance and requires less system resources by
reducing the number of unique words that a system has to contain. Stemming algorithms are
used to improve the efficiency of the information system and to improve recall. It is
important for a system to be able to categorize a word prior to making decision to stem it.
Stemming of the words “calculate, calculations, calculates, calculating” to a single
term(“calculat”) ensures whichever of those terms is entered by user , it is translated to the
stem and finds all variants in any items they exist. Stemming Algorithms removes suffixes
and prefixes ,sometimes recursively to derive the final stem .
Definitions:
CONSONANT: a letter other than A, E, I, O, U, and Y .
VOWELS
With this definition, all words are of the form:
(C)(VC)m(V)

7
C=string of one or more consonants.
V=string of one or more vowels.
E.g.,
Tr ou bl e
C V CV
Step 1:
SSES -> SS
caresses -> caress
IES -> I
ponies -> poni
ties -> ti
SS -> SS
caress -> caress
S -> є
cats -> cat
Step 2a:
(m>1) EED -> EE
Condition verified: agreed -> agree
Condition not verified: feed -> feed

(*V*) ED -> є
Condition verified: plastered -> plaster
Condition not verified: bled -> bled

(*V*) ING -> є


Condition verified: motoring -> motor
Condition not verified: sing -> sing
Step 2b:
(These rules are applied if second or third rule in 2a apply)
AT-> ATE

8
conflat(ed) -> conflate
BL -> BLE
Troubl(ing) -> trouble

(*d & ! (*L or *S or *Z)) -> single letter


Condition verified: hopp(ing) -> hop, tann(ed) -> tan
Condition not verified: fall(ing) -> fall

(m=1 & *o) -> E


Condition verified: fil(ing) -> file
Condition not verified: fail -> fail

Step 3 and 4:
Step 3: Y Elimination (*V*) Y -> I
Condition verified: happy -> happi
Condition not verified: sky -> sky

Step 4: Derivational Morphology, I


(m>0) ATIONAL -> ATE
Relational -> relate
(m>0) IZATION -> IZE
generalization-> generalize
(m>0) BILITI -> BLE
sensibiliti -> sensible

Step 5 and 6:
Step 5: Derivational Morphology, II
(m>0) ICATE -> IC
triplicate -> triplic
(m>0) FUL -> є

9
hopeful -> hope
(m>0) NESS -> є
goodness -> good

Step 6: Derivational Morphology, III


(m>0) ANCE -> є
allowance-> allow
(m>0) ENT -> є
dependent-> depend
(m>0) IVE -> є
effective -> effect
Step 7:
Step 7a
(m>1) E -> є
probate -> probat
(m=1 & !*o) NESS -> є
goodness -> good

Step 7b
(m>1 & *d & *L) -> single letter
Condition verified: controll -> control
Condition not verified: roll -> roll

In the example, we input the word MULTIDIMENSIONAL to the Porter Stemming


algorithm. Let’s see what happens as the word goes through steps 1 to 5.

10
The suffix will not match any of the cases found in steps 1, 2 and 3.
Then it comes to step 4.
The stem of the word has m > 1 (since m = 5) and ends with “AL”.
Hence in step 4, “AL” is deleted (replaced with null).
Calling step 5 will not change the stem further.
Finally the output will be MULTIDIMENSION.
MULTIDIMENSIONAL → MULTIDIMENSION

Q.3. Attempt All


What do you mean by word sense disambiguation (WSD) ? Explain the Lesk Algorithm
for WSD with suitable examples.
WSD:
Word sense disambiguation (WSD) is the task of selecting the correct sense for a word in a
given sentence. This problem has to be faced for words having more meanings. It requires a
dictionary listing all the possible senses for each word. It can be faced for each single word
or jointly for all the words in the sentence (all the meaning combinations should be
considered). In natural language processing, word sense disambiguation (WSD) is the
problem of determining which "sense" (meaning) of a word is activated by the use of the
a) word in a particular context, a process which appears to be largely unconscious in people. 10
WSD is a natural classification problem: Given a word and its possible senses, as defined by
a dictionary, classify an occurrence of the word in context into one or more of its sense
classes. The features of the context (such as neighboring words) provide the evidence for
classification..
Lexical ambiguity, syntactic or semantic, is one of the very first problem that any NLP
system faces. Part-of-speech (POS) taggers with high level of accuracy can solve Word’s
syntactic ambiguity. On the other hand, the problem of resolving semantic ambiguity is
called WSD (word sense disambiguation). Resolving semantic ambiguity is harder than
resolving syntactic ambiguity.

11
Consider the two examples of the distinct sense that exist for the word “bass”
I can hear bass sound.
He likes to eat grilled bass.
The occurrence of the word bass clearly denotes the distinct meaning. In first sentence, it
means frequency and in second, it means fish. Hence, if it would be disambiguated by WSD
then the correct meaning to the above sentences can be assigned as follows −
I can hear bass/frequency sound.
He likes to eat grilled bass/fish.
Dictionary- and knowledge-based methods: These rely primarily on dictionaries, thesauri,
and lexical knowledge bases, without using any corpus evidence.
Supervised methods: These make use of sense-annotated corpora to train from.
Semi-supervised or minimally-supervised methods: These make use of a secondary source of
knowledge such as a small annotated corpus as seed data in a bootstrapping process, or a
word-aligned bilingual corpus.
Unsupervised methods: These eschew (almost) completely external information and work
directly from raw unannotated corpora. These methods are also known under the name of
word sense discrimination
Applications
Machine translation
WSD is required for lexical choice in MT for words that have different translations for
different senses. For example, in an English-French financial news translator, the English
noun change could translate to either changement ('transformation') or monnaie ('pocket
money'). However, most translation systems do not use a separate WSD module. The lexicon
is often pre-disambiguated for a given domain, or hand-crafted rules are devised, or WSD is
folded into a statistical translation model, where words are translated within phrases which
thereby provide context.
Information retrieval
Ambiguity has to be resolved in some queries. For instance, given the query "depression"
should the system return documents about illness, weather systems, or economics? Current
IR systems (such as Web search engines), like MT, do not use a WSD module; they rely on
the user typing enough context in the query to only retrieve documents relevant to the
intended sense (e.g., "tropical depression"). In a process called mutual disambiguation,
reminiscent of the Lesk method (below), all the ambiguous words are disambiguated by
virtue of the intended senses co-occurring in the same document.
Information extraction and knowledge acquisition
In information extraction and text mining, WSD is required for the accurate analysis of text
in many applications. For instance, an intelligence gathering system might need to flag up
references to, say, illegal drugs, rather than medical drugs. Bioinformatics research requires
the relationships between genes and gene products to be catalogued from the vast scientific
literature; however, genes and their proteins often have the same name. More generally, the

12
Semantic Web requires automatic annotation of documents according to a reference
ontology. WSD is only beginning to be applied in these areas.

The Lesk method is the seminal dictionary-based method introduced by Michael Lesk in
1986. The Lesk definition, on which the Lesk algorithm is based is “measure overlap
between sense definitions for all words in context”. A simple approach is the Lesk algorithm
(1986). The algorithm computes the intersection among the glosses associated to the
different meanings of the words in the sentence. The combination yielding the maximum
overall intersection is selected (the complexity is combinatorial in the number of senses)

Explain in detail.
Kilgarriff and Rosensweig gave the simplified Lesk definition as “measure overlap between
sense definitions of word and current context”, which further means identify the correct
sense for one word at a time. Here the current context is the set of words in surrounding
sentence or paragraph or document.
Limitations of the Lesk algorithm
The Lesk algorithm yields a 50-70% accuracy
The main limitation is its dependence on the quality of the glosses/examples provided for
each sense in the dictionary since they are usually short and do not carry enough information
to train a classifier. The words in the context and their definition should share a significant
intersection (they should share the maximum number of terms). coverage can be improved
by adding the words related to the target but not already contained in the glosses for example
the words of the definitions containing the target word only when the actual sense of the
target word is clear in that context. In the computation of the intersection/similarity among
the context more flexible measures can be exploited Correlation with
TermFrequency-InverseDocumentFrequency weights in order to reduce the importance of
most common words.
What is Coreference Resolution?Explain Hobb’s Algorithm for Coreference
Resolution.
Coreference
b) 10
It occurs when two or more expressions in a text refer to the same person or thing; they have
the same referent. e.g. Bill said he would come; The proper noun Bill and the pronoun he
refer to the same person, namely to Bill. Coreference occurs when one or more expressions
in a document refer back to an entity that came before it/them. When two expressions are

13
coreferential, the one is usually a full form (the antecedent) and the other is an abbreviated
form (a proform or anaphor)
Coreference resolution : Coreference resolution is the task of clustering mentions in text that
refer to the same underlying real world entities. Coreference resolution, is the task of finding
all expressions that are coreferent with any of the entities found in a given text. Coreference
resolution is the task of resolving noun phrases to the entities that they refer to. For example,
in the sentence, “Andrew said he would buy a car” the pronoun “he” refers to the same
person, namely to “Andrew”.

“I”, “my”, and “she” belong to the same cluster and “Obama” and “he” belong to the same
cluster.
A classic problem for coreference resolution in English is the pronoun it, which has many
uses.
It can refer much like he and she, except that it generally refers to inanimate objects (the
rules are actually more complex: animals may be any of it, he, or she; ships are traditionally
she; hurricanes are usually it despite having gendered names).
It can also refer to abstractions rather than beings: "He was paid minimum wage, but didn't
seem to mind it."
Approach to coreference resolution
Coreference Resolution in Two Steps :
1. Detect the mentions (easy)
“[I] voted for [Nader] because [he] was most aligned with [[my] values],” [she] said

2. Cluster the mentions (hard)


“[I] voted for [Nader] because [he] was most aligned with [[my] values],” [she] said
Coreference distinctions
When exploring coreference, there are numerous distinctions that can be made, e.g.
anaphora, cataphora, split antecedents, coreferring noun phrases
When dealing with proforms (pronouns, pro-verbs, pro-adjectives, etc.), one distinguishes
between anaphora and cataphora.

14
When the proform follows the expression to which it refers, anaphora is present (the proform
is an anaphor), and when it precedes the expression to which it refers, cataphora is present
(the proform is a cataphor).
Anaphora
a. The musici was so loud that iti couldn't be enjoyed. –The anaphor it follows the expression
to which it refers (its antecedent).
b. Our neighborsi dislike the music. If theyi are angry, the cops will show up soon. The
anaphor they follows the expression to which it refers (its antecedent).
Cataphora
a. If theyi are angry about the music, the neighborsi will call the cops. – The cataphor they
precedes the expression to which it refers (its postcedent).
b. Despite heri difficulty, Wilmai came to understand the point. – The cataphor her precedes
the expression to which it refers (its postcedent)
Split antecedents
a. Caroli told Bobi to attend the party. Theyi arrived together. – The anaphor they has a split
antecedent, referring to both Carol and Bob.
b. When Caroli helps Bobi and Bobi helps Caroli, theyi can accomplish any task. – The
anaphor they has a split antecedent, referring to both Carol and Bob.
Coreferring noun phrases
a. The project leaderi is refusing to help. The jerki thinks only of himself. – Coreferring
noun phrases, whereby the second noun phrase is a predication over the first.
b. Some of our colleagues1 are going to be supportive. These kinds of people1 will earn
our gratitude. – Coreferring noun phrases, whereby the second noun phrase is a predication
over the first.
Hobbs’ algorithm was one of the earliest approaches to pronoun resolution. The algorithm is
mainly based on the syntactic parse tree of the sentences. It makes use of syntactic
constraints when resolving pronouns. First, intra-sentential antecedents are proposed - the
syntactic tree of the current sentence is searched in a breadth-first left to-right fashion to find
antecedents. The contra-indexing constraint is taken care inside the algorithm, by making
sure that the path from the NP to the S node of the syntactic tree has at least one another NP
on the way. If there are higher-level nodes in the current sentence, then antecedents resulting
from a breadth-first left-to-right search of each subtree, are proposed. Then, parse trees of
previous sentences in reverse chronological order are searched in the same fashion to
propose antecedents

1. Start with target pronoun


2. Climb parse tree to S root
3. For each NP or S

15
a. Do breadth-first, left-to-right search of children
b. Restricted to left of target
c. For each NP, check agreement with target
4. Repeat on earlier sentences until matching NP found

Explain the steps for the example

Q.4. Attempt All


Explain with suitable examples the following relationships between word meanings:
Homonymy, Polysemy, Synonymy, Antonymy, Hypernymy, Hyponymy .
Homonym: In linguistics, homonyms, broadly defined, are words which are homographs
(words that share the same spelling, regardless of pronunciation) or homophones (words that
share the same pronunciation, regardless of spelling), or both. For example, according to
this definition, the words row (propel with oars), row (argument) and row (a linear
arrangement) are homonyms, as are the words see (vision) and sea (body of water). A
a) classic example of homonymy is Bank (river bank or financial institution). Bat (wooden 10
stick-like thing) vs Bat (flying scary mammal thing). A more restrictive or technical
definition sees homonyms as words that are simultaneously homographs and homophones –
that is to say they have identical spelling and pronunciation, whilst maintaining different
meanings. Examples are the pair stalk (part of a plant) and stalk (follow/harass a person) and
the pair left (past tense of leave) and left (opposite of right). The relationship between a set
of homonyms is called homonymy, and the associated adjective is homonymous or
homonymic.

16
The adjective "homonymous" can additionally be used wherever two items share the same
name, independent of how closely they are or are not related in terms of their meaning or
etymology. For example, the name Ōkami is homonymous with the Japanese term for "wolf"
(Ōkami).
Polysemy: Many words have more than one meaning of sense. Unlike homonyms,
polysemes are words with related meanings. This linguistic phenomenon is called polysemy
or lexical ambiguity. Words that have several senses are ambiguous and called polysemous.
For example, the word “chair” can refer to a piece of furniture, a person, the act of presiding
over a discussion etc. The word “employ” is a polysemy as its two meaning- to hire (employ
a person) and to accept (employ an idea) are related. In a particular use, only one of these
meanings is correct.
When two senses are related semantically we call it polysemy (rather than homonymy)
Magazine : Something you read
Magazine : Cartridge to store bullets for gun
How they are Polysemy?
Check the history relation of both senses
Its related to storage!!!!
Specific types of polysemy : Metaphor and Metonymy
Metaphor:
where there is a resemblance between senses (from "swallowing (a pill)" to "swallowing (an
argument)" .
Germany will pull Slovenia out of its economic slump.
I spent 2 hours on that homework. I put money into Google stock.
Metonymy (use of one aspect of a concept or entity to refer to other aspects of the entity or
to the entity itself) metonymy, where one sense "stands for" another (from "hands (body
part)" to "hands (manual labourers)"
The Pentagon
Synonym: The word synonym defines the relationship between different words that have a
similar meaning. A simple way to decide whether two words are synonymous is to check for
substitutability. Two words are synonyms in a context if they can be substituted for each
other without changing the meaning of the sentence. These relationships are useful in
organising words in lexical databases. Sense relations as a word which has the same sense, or
nearly the same as another word.
For example
i. The student speaks with a broad British accent.
ii. The student speaks with a wide British accent.
Hence they have the same or nearly the same sense but not in the other contexts to mean the
same. However this kind of sense relation means “word of the same meaning”

17
Synonym is a condition in which two lexemes or words have more or less the same lexical
meaning. This condition results from the contiguity or sameness in meaning between two
lexemes or word. Example; Small – little. Big – large. Politician – statesman

Antonym is a semantic relationship that holds between words that express opposite
meanings. The word Good is an antonym of Bad, and White is an antonym of Black. This is
the sense relation whereby words are related by having opposite meaning. Antonymy is the
standard technical term for opposite meaning between lexeme
There are three types of antonyms
i. Gradable antonyms: A gradable antonym is one of a pair of words with opposite meanings
where the two meanings lie on a continuous spectrum.
Temperature is such a continuous spectrum so hot and cold, two meanings on opposite ends
of the spectrum, are gradable antonyms.
Other examples include: heavy : light, fat : skinny, dark : light, young : old, early : late,
empty : full, dull : interesting.

ii. Complementary antonyms


A complementary antonym, sometimes called a binary or contradictory antonym , is one of a
pair of words with opposite meanings, where the two meanings do not lie on a continuous
spectrum.
There is no continuous spectrum between odd and even but they are opposite in meaning and
are therefore complementary antonyms.
Other examples include: mortal : immortal, exit : entrance, exhale : inhale, occupied : vacant.

iii. Relational antonyms


A relational antonym is one of a pair of words that refer to a relationship from opposite
points of view.
There is no lexical opposite of teacher, but teacher and pupil are opposite within the context
of their relationship.
This makes them relational antonyms. Other examples include: husband : wife, doctor :
patient, predator : prey, teach : learn, servant : master, come : go, parent : child.

Hypernym is a word with the more general sense. The word automobile is a hypernym for a
car and a truck. The hyponym is a word with the most specific meaning. In the relationship
between car and automobile, car is a hyponym of automobile. Mango is a hyponym of fruit
Dog is a hyponym of animal. Animal is a hypernym of dog. Fruit is a hypernym of mango

18
What are different methods of Part of Speech tagging? Explain in detail Stochastic
Tagging approach for Part of Speech Tagging.
The part of speech tagging is a process of assigning corresponding part of speech like noun,
verb, adverb, adjective, verb to each word in a sentence. It is a process of converting a
sentence to forms – list of words, list of tuples (where each tuple is having a form (word,
tag)). The tag in case of it is a part of speech tag, and signifies whether the word is a noun,
adjective, verb, and so on. Automatic assignment of descriptors to the given tokens is called
Tagging. The descriptor is called tag. Book/VB that/DT flight/NN. Each word has a
part-of-speech tag to describe its category.
Part-of-speech tag of a word is one of major word groups (or its subgroups).
open classes -- noun, verb, adjective, adverb
closed classes -- prepositions, determiners, conjunctions, pronouns, participles

POS Taggers try to find POS tags for the words.


duck is a verb or noun? (morphological analyzer cannot make decision).
b) A POS tagger may make that decision by looking the surrounding words. 10

Duck! (verb)
Duck is delicious for dinner. (noun)
Different methods of Part of Speech tagging
Rule-based POS tagging: The rule-based POS tagging models apply a set of handwritten
rules and use contextual information to assign POS tags to words. These rules are often
known as context frame rules.
Transformation Based Tagging: The transformation-based approaches use a pre-defined set
of handcrafted rules as well as automatically induced rules that are generated during training.
Transformation based tagging is also called Brill tagging. It is a rule-based algorithm for
automatic tagging of POS to the given text. TBL, allows us to have linguistic knowledge in a
readable form, transforms one state to another state by using transformation rules.
Combination of Rule-based and stochastic tagging. Like rule-based because rules are used to
specify tags in a certain environment. Like stochastic approach because machine learning is
used—with tagged corpus as input
Input:tagged corpus

19
dictionary (with most frequent tags)
Usually constructed from the tagged corpus

steps to understand the working of TBL −


Start with the solution − The TBL usually starts with some solution to the problem and
works in cycles.
Most beneficial transformation chosen − In each cycle, TBL will choose the most beneficial
transformation.
Apply to the problem − The transformation chosen in the last step will be applied to the
problem.
The algorithm will stop when the selected transformation in step 2 will not add either more
value or there are no more transformations to be selected.
Step 1: Label every word with most likely tag (from dictionary)
Step 2: Check every possible transformation & select one which most improves tagging
Step 3: Re-tag corpus applying the rules
Repeat 2-3 until some criterion is reached, e.g., X% correct with respect to training corpus
RESULT: Sequence of transformation rules
Deep learning models: Various Deep learning models have been used for POS tagging such
as Meta-BiLSTM which have shown an impressive accuracy of around 97 percent.
Stochastic Tagging
The use of probabilities in tags is quite old. Stochastic taggers use probabilistic and statistical
information to assign tags to words. The model that includes frequency or probability
(statistics) can be called stochastic. These taggers might use ‘tag sequence probabilities’,
‘word frequency measurements’ or a combination of both. The tag encountered most
frequently in the training set is the one assigned to an ambiguous instance of that word (word
frequency measurements). A stochastic approach includes frequency, probability or statistics.
The simplest stochastic approach finds out the most frequently used tag for a specific word
in the annotated training data and uses this information to tag that word in the unannotated
text. But sometimes this approach comes up with sequences of tags for sentences that are not
acceptable according to the grammar rules of a language. One such approach is to calculate
the probabilities of various tag sequences that are possible for a sentence and assign the POS
tags from the sequence with the highest probability. Hidden Markov Models (HMMs) are
probabilistic approaches to assign a POS Tag. The best tag for a given word is determined by
the probability that it occurs with the n previous tags (tag sequence probabilities) Resolves
the ambiguity by computing the probability of a given word (or the tag). The problem with
this approach is that it can come up with sequences of tags for sentences that are not
acceptable according to the grammar rules of a language. Stochastic tagger applies the
following approaches for POS tagging −
1. Word Frequency Approach

20
In this approach, the stochastic taggers disambiguate the words based on the probability that
a word occurs with a particular tag.
We can also say that the tag encountered most frequently with the word in the training set is
the one assigned to an ambiguous instance of that word.
The main issue with this approach is that it may yield inadmissible sequence of tags.
2. Tag Sequence Probabilities
Here the tagger calculates the probability of a given sequence of tags occurring.
It is also called n-gram approach. It is called so because the best tag for a given word is
determined by the probability at which it occurs with the n previous tags.
Unigram approach assigns each word to its most common tag and consider one word at a
time. P (ti/wi) = freq (wi/ti)/freq (wi)
Here Probability of tag given word is computed by frequency count of word given tag
divided by frequency count of that particular word
Bigram approach is based on preceding tag i.e. it take two tags: the preceding tag and current
tag into account. P (ti/wi) = P (wi/ti). P (ti/ti-1)
Here P (wi/ti) is the probability of current word given current tag and P (ti/ti-1) is the
probability of a current tag given the previous tag P (wi/ti).
Tigram is based on previous two tags. P (ti/wi) = P (wi/ti). P (ti/ti-2, ti-1), Where ti denotes
tag sequence and wi denote word sequence.
P (wi/ti) is the probability of current word given current tag.
Here, P(ti|ti-2, ti-1)is the probability of a current tag given the previous two tags.

Stochastic POS taggers possess the following properties −


This POS tagging is based on the probability of tag occurring.
It requires training corpus
There would be no probability for the words that do not exist in the corpus.
It uses different testing corpus (other than training corpus).
It is the simplest POS tagging because it chooses most frequent tags associated with a word
in training corpus
Intuition: Pick the most likely tag based on context
Maximize the formula using a HMM : P(word|tag) × P(tag|previous n tags)
Observe: W = w1, w2, …, wn
Hidden: T = t1,t2,…,tn
Goal: Find POS tags that generate a sequence of words, i.e., look for most probable
sequence of tags T underlying the observed words W

21
We cannot determine the exact sequence of tags that generated and calculate using t =
argmax P(w, t) and it is based on the Markovian assumption that the current tag depends only
on the previous n tags.
Use transition probability(i.e. forward tag and backward tags).
P (ti/wi) = P (ti/ti-1). P (ti+1/ti). P (wi/ti)
P (ti/ti-1) is the probability of current tag given previous tag
P (ti+1/ti) is the probability of future tag given current tag.
P (wi/ti) Probability of word given current tag
Q.5 Attempt All
Write a note on Text Summarization.
Summarization means to reduce the size of the document without changing its meaning. Text
summarization is the process of creating a short, coherent, and fluent summary of a longer
text document and involves the outlining of the text’s major points. Automatic text
summarization is the task of producing a concise and fluent summary while preserving key
information content and overall meaning. A good summary should cover the most vital
information of the original document or a cluster of documents, while being coherent,
non-redundant and grammatically readable. Text identification, interpretation and summary
generation, and analysis of the generated summary are some of the key challenges faced in
the process of text summarization.
The extractive text summarization technique involves pulling keyphrases from the source
document and combining them to make a summary. The extraction is made according to the
defined metric without making any changes to the texts. For example, when we want to
summarize our text on the basis of the frequency method, we store all the important words
and frequency of all those words in the dictionary. On the basis of high frequency words, we
store the sentences containing that word in our final summary. This means the words which
a) 5
are in our summary confirm that they are part of the given text.
Source text: Joseph and Mary rode on a tesla to attend the annual event in New york.
In the city, Mary got job in Google. Mary was assigned as ML developer.
Extractive summary: Joseph and Mary rode on a tesla to attend the annual event in
New york. Mary was assigned as ML developer.
The abstraction technique entails paraphrasing and shortening parts of the source
document. The abstractive text summarization algorithms create new phrases and sentences
that relay the most useful information from the original text — just like humans do.
Therefore, abstraction performs better than extraction. Source text: Joseph and Mary rode on
a tesla to attend the annual event in New york. In the city, Mary got job in Google.
Abstractive summary: Joseph and Mary came to New york where she got job as ML
developer in Google

22
Write a note on the Question Answering system.
Question answering system helps user to find the precise answer to the question
articulated in natural language. Question answering system provides explicit, concise and
accurate answer to user questions rather than providing a set of relevant documents or web
pages as answers as most of the information retrieval system does.
Question answering system basically consists of three parts as
Question Processing
Document Processing
Answer Processing
Question processing phase is to analyze the structure of the user’s question and then to
identify question type to get relevant information to user question Finally transform the
question into a meaningful question formula compatible with QA’s domain
Document Processing has as its main feature the selection of a set of relevant documents and
the extraction of a set of paragraphs depending on the focus of the questioner text
understanding throw natural language processing. This task can generate a dataset or a neural
model which will provide the source for the answer extraction. The retrieved data can be
ranked according to its relevance for the question.
The Answer Processing is the most challenging task on a Question Answering system. This
b) 5
module uses extraction techniques on the result of the Document Processing module to
present an answer. The answer must be a simple answer for the question, but it might require
merging information from different sources, summarization, dealing with uncertainty or
contradiction.
Close domain QAS : scope of user question is limited to a particular domain like medicine,
movies, history and others.
Open domain QAS : mostly works like search engines like Google and all where it provides
explicit answers to question belonging to any domain.
Question Types in QAs: Factoid question: Here answers are simple fact about the entity in
question. Descriptive question: Here question provides answers to full detail about a person,
place or any event. Simple yes/no type of question: here questions provides answers as yes
or no IR-based Factoid Question Answering goal is to answer a user’s question by finding
short text segments on the Web or some other collection of documents.
Knowledge-based question answering is the idea of answering a natural language question
by mapping it to a query over a structured database. The logical form of the question is thus
either in the form of a query. The database can be a full relational database( NLDBI) .
Systems for mapping from a text string to any logical form are called semantic parsers.
Semantic parsers for question answering usually map to query language like SQL or
SPARQL. Using multiple information sources: IBM’s Watson system from IBM is an
example of a system that relies on a wide variety of resources to answer questions.

23
c) What is Sentiment Analysis? What are different types of Sentiment Analysis?Also
explain different Sentiment Classification Techniques.
Sentiment Analysis is a natural language processing task that deals with finding orientation
of opinion in a piece of text with respect to a topic. It deals with analyzing emotions,
feelings, and the attitude of a speaker or a writer from a given piece of text. Sentiment
Analysis involves capturing of user’s behavior, likes and dislikes of an individual from the
text. The target of SA is to find opinions, identify the sentiments they express, and then
classify their polarity. Usually, besides identifying the opinion, these systems extract
attributes of the expression e.g.:
– Polarity: if the speaker express a positive or negative opinion
– Subject: the thing that is being talked about
– Opinion holder: the person, or entity that expresses the opinio
Classification levels in Sentiment Analysis
• Document-level: Document-level SA aims to classify an opinion of the whole document as
expressing a positive or negative sentiment.
• Sentence-level: Sentence-level SA aims to classify sentiment expressed in each sentence
which involves identifying whether sentence is subjective or objective
• Aspect-level :Aspect-level SA aims to classify the sentiment with respect to the specific
aspects of entities which is done by identifying the entities and their aspects. "The battery
life of this camera is too short."
Types of Sentiment Analysis 10

Standard Sentiment Analysis: I love how Zapier takes different apps and ties them together'
→ Positive, “I still need to further test Zapier to say if its useful for me or not' → Neutral,
“Zapier is sooooo confusing to me' → Negative
Fine-grained Sentiment Analysis: Very positive , Positive , Neutral , Negative , Very negative
, For example, imagine having the following survey responses:'The older interface was much
simpler' → Negative, 'Awful experience. I would never buy this product again!' → Very
Negative, 'I don't think there is anything I really dislike about the product' → Neutral
Emotion Detection
'Hubspot makes my day a lot easier :)' → Happiness. 'Your customer service is a nightmare!
Totally useless!!' → Anger
Aspect-based Sentiment Analysis: This type of sentiment analysis focuses on understanding
the aspects or features that are being discussed in a given opinion. Product reviews, for
example, are often composed of different opinions about different characteristics of a
product, like Price,UX-UI, Integrations, Mobile Version, etc

Intent Detection: This type of sentiment analysis tries to find an action behind a given
opinion, something that the user wants to do. Identifying user intents allows you to detect
24
valuable opportunities to help customers, such as solving an issue, making improvements on
a product or deriving complaints to the correspondent areas: “Very frustrated right now.
Instagram keeps closing when I log in. Can you help?” → Request for Assistance

There are many methods and algorithms to implement sentiment analysis systems, which can
be classified as:

Rule-based systems that perform sentiment analysis based on a set of manually crafted rules.
Automatic systems that rely on machine learning techniques to learn from data.
Hybrid systems that combine both rule based and automatic approaches.
Explain any one in detail
Rule-based Sentiment Analysis: Lexicons” or lists of positive and negative words are
created. These are words that are used to describe sentiment. For example, positive lexicons
might include “fast”, “affordable”, and “user-friendly“. Negative lexicons could include
“slow”, “pricey”, and “complicated”. Before text can be analyzed it needs to be prepared.
Several processes are used like Tokenization, Lemmatization, stopword removal (words
have little or no semantic value in the sentence). Counting the number of positive or
negative words in a particular text. A special rule can make sure that negated words, e.g. “not
easy”, are counted as opposite. The final step is to calculate the overall sentiment score for
the text on a scale of -100 to 100. In this case a score of 100 would be the highest score
possible for positive sentiment. A score of 0 would indicate neutral sentiment. The score can
also be expressed as a percentage, ranging from 0% as negative and 100% as positive.

25

You might also like