You are on page 1of 9

Unit 3 NLP

Computational Lexicons in Natural language processing


A computational lexicon is a digital, machine-readable resource that contains a
comprehensive collection of lexical and semantic information about words,
phrases, and other linguistic elements in a particular language. Computational
lexicons serve as valuable references for NLP systems and researchers,
providing detailed information about the words and their properties, which
facilitates various language processing tasks.
Well-known computational lexicons in NLP include WordNet for English, which
organizes words into synsets (sets of synonymous words with shared
meanings), and ConceptNet, which focuses on representing word relationships
and common-sense knowledge. These lexicons are essential resources for
building NLP systems, aiding in tasks like information retrieval, machine
translation, sentiment analysis, and more by providing a structured foundation
for understanding and processing text in a computational manner.
Key features and components of a computational lexicon include:
1. Word Entries: Lexicons provide detailed entries for individual words,
multi-word phrases, or other linguistic units, each with associated
information.
2. Part-of-Speech (POS) Information: Lexicons specify the grammatical
category of each word, such as nouns, verbs, adjectives, adverbs, etc.,
which is crucial for syntactic analysis.
3. Morphological Information: Lexicons often include details about a word's
morphological properties, such as inflected forms, roots, prefixes, and
suffixes.
4. Pronunciation and Phonetics: Some lexicons provide information about
word pronunciation, including phonetic transcriptions, stress patterns,
and phonological features.
5. Semantic Information: Lexicons offer information about word meanings,
often through word senses or semantic relationships with other words,
enabling semantic analysis.

English WordNet
English WordNet is a comprehensive lexical database for the English
language developed at Princeton University. It categorizes English
words into sets of synonyms, known as "synsets," and defines
semantic relationships between these synsets. These relationships
include hypernymy/hyponymy (is-a relationships), meronymy (part-
whole relationships), antonymy (opposite meanings), and more.
English WordNet organizes synsets into a hierarchical structure,
providing a semantic hierarchy of concepts. It also offers word sense
disambiguation by providing multiple word senses for polysemous
words. Each word in WordNet is associated with a lemma, the base
form of the word, and part-of-speech tags..
Hindi WordNet
Hindi WordNet is a system for bringing together different lexical and semantic
relations between Hindi words. It organizes the lexical information in terms of
word meaningsand can be termed as a lexicon based on psycholinguistic
principles. Hindi WordNet is widely used in many NLP applications. In this, for
each word there is a synset representing one lexical concept. Synsets are the
basic building blocks of Hindi WordNet. The lexicon deals with the content
words or open-class category of words. Thus, Hindi WordNet contains the
following categories of words: nouns, verbs, adjectives, and adverbs.
Each entry in Hindi WordNet consists of the entries synset, gloss (description of
concept), and position in ontology
The main obstacle to high-performance NLP applications is the knowledge
acquisition bottleneck.
Next, we describe some of the prominent causes :-
i. Absence of proper expressiveness : Need to consider approximate
relations for better understanding. For examples of word pairs like
"vidyalaya" and "samsthana" that are approximately similar but not
synonyms.
ii. Missing composition of semantic relationships: Hindi WordNet lacks
defined compositions for similar/dissimilar relations.For example like
"vahana" and "kara" having hypernymy-hyponymy relations and "kara"
and "pahiya" having meronymy-holonymy relations but relation between
"vahana" and "pahiya" is not defined in Hindi WordNet.
Fuzzy Hindi WordNet
Fuzzy Hindi WordNet is a word sense network. A word sense node in this
network is a synset that is regarded as a basic object in Fuzzy Hindi WordNet.
Each synset in Fuzzy Hindi WordNet is linked to other synsets through well-
known lexical and semantic relations such as fuzzy hypernymy, fuzzy hyponymy,
fuzzy meronymy, fuzzy troponymy, fuzzy antonymy, and fuzzy entailment.
Semantic relations are between synsets, and lexical relations are between
words. These relations serve to organize the lexical knowledge base.
We render Fuzzy Hindi WordNet as a fuzzy graph where nodes represent
concepts (synsets) and edges represent fuzzy relations between concepts. The
weight of an edge represents the strength of the relation between two
concepts/synsets. The value of strength varies from 0 to 1

Synsets and different Relationships in these computational Lexicons


Synsets (synonym set): Synsets are the basic building blocks of WordNet, A
synset is a collection of words or phrases that are synonymous, meaning they
share a common meaning, or they are closely related in meaning within a
specific context. Synsets are used to organize and represent the vocabulary of a
language in a structured and semantically meaningful way.

Relations in Fuzzy Hindi WordNet


Relations between Same Parts of Speech

1. Fuzzy Association
In Fuzzy Hindi WordNet, words often have approximate similar meanings,
which can be represented through a fuzzy association relation. This semantic
relation between two synsets signifies partially similar meanings between
concepts. The relation is denoted as (w1, w2, μas), where w1 and w2 are
approximate synonyms. The strength of this relation is represented by μas.
Examples: (vidyalaya, patshala, skula, pathalaya) → (samsthana, adhishtana,
pratishtana, istitayutaka) with a strength of 0.8.
2. Fuzzy Hypernymy and Fuzzy Hyponymy
These relations exist between synsets that capture superset/subset
relationships. Fuzzy hypernymy indicates that one synset is an approximate
subset of another, while fuzzy hyponymy signifies the reverse relationship.
These are denoted as (w1, w2, μhr/μhp), where μhr/μhp represents the degree
of the relationship.
Examples:(mattha) → (dahi) with a strength of 0.8.

3. Fuzzy Meronymy and Fuzzy Holonymy


Fuzzy meronymy and fuzzy holonymy capture the partial "part-whole"
relationship between synsets. This relationship is represented as (w1, w2,
μme/μho), with μme/μho indicating the degree of the relationship.
Examples: (gatta, kuta, daftari, vasali) → (kitaba, pustaka) with a strength of
0.7.

4. Fuzzy Antonymy
Fuzzy antonymy represents the relation between two words expressing
approximately opposite meanings. This relation is denoted as (w1, w2, μan),
where μan signifies the strength of the relation.
Examples: (pareshani) → (khushi) with a strength of 0.8.

5. Fuzzy Entailment
Fuzzy entailment denotes the logical relationship between two verb synsets
where the truth of one follows logically from the other. It is a one-way relation
and is represented as (v1 μe → v2), where μe represents the strength of the
entailment.
Examples: (sona) → (letana) with a strength of 0.9.
6. Fuzzy Troponymy
Fuzzy troponymy captures the relation between synsets of verbs, where one
verb denotes an elaboration of another in a specific manner. This relation is
represented as (v1, v2, μt), where μt indicates the strength of the relationship.
Examples: (padhana, parhai karana) → (sikhaana) with a strength of 0.8.

7. Fuzzy Gradation
Fuzzy gradation represents intermediate concepts between fuzzy antonyms.
The relation between synsets is represented as (w1, w2, μg), where μg signifies
the strength of the gradation.
Example:(hesana, samanya manodasa, karahana) with a strength of 0.8.

8. Fuzzy Causative
The fuzzy causative relation links causative verbs and signifies the
interdependency between different morphological forms of a verb. It is a lexical
relation with a unity strength value.
Example: (khana, khilana) with a strength of 1.0.

Relations between Cross Parts of Speech


Linkage between Nominal and Verbal Concepts

9. Fuzzy Ability Link


This relation denotes the ability of a nominal concept and assigns a weight μab.
Eg.: ( (magaramaccha), (tairana), 0.7)
10.Fuzzy Capability Link
It specifies the acquired features of a nominal concept and is associated with a
weight μcp.
Eg.: ( (admı), (vahana calana), 0.9) and ( (ladaka), (vahana calana), 0.6) )
11.Fuzzy Function Link
The function link denotes the specific functions of a nominal concept and can
vary in strength based on the particular function's significance.

Linkage between Nominal and Adjectival Concepts

12.Fuzzy Attribute
This relation represents partial properties of an attribute of a noun in the
adjective.
Eg.: ( (jamına), (upajau), μat), ( (candı), (camakadara), μat)

13.Fuzzy Modifies Noun


This relation specifies adjectives that can only be used to describe certain
nouns. The membership value of this relation is always 1, representing a
classical relation.

Example: (सुपत्र, सत्पात्र, अच्छा पात्र) modifies (व्यक्ति, मानव, साक्षर, सख्स, जन, बंदा,
बंदा).

Linkage between Verbal and Adjectival Concepts

14.Fuzzy Modifies Verb


This relation shows the connection between an adverb and a specific verb that
the adverb modifies. The membership value of this relation is always 1,
representing a classical relation.

Example: (तेज, तेज़, तेजी से, तेजी, रफ्तार से, रफ़तार से, तेज गतत से) modifies
(दौड़ना, भागना, धनना).

15.Fuzzy Derived From


Fuzzy Derived From specifies the root form from which a particular word is
derived. The membership value of this relation is always 1.

Example: (क्रमसह), derived from (क्रम).


Composition of Relations in Fuzzy Hindi WordNet
The composition of relations in Fuzzy Hindi WordNet allows for deriving
connections between concepts even if they are not explicitly defined. It's based
on the concept of composition, which considers the transitivity in relations and
combines them to determine the type and strength of the relationship
between two words. This allows you to find semantic connections between
words by traversing one or more links and considering the strength and relation
types of the links.

Here's how it works:


Matrix M (Table I) holds the combined relations of different types, and Matrix S
(Table II) shows the strength of the combined relations.
• Matrix M (an 8x8 matrix) combines relations between concepts:
o ∀ (a, b) ∈ Xi && ∀ (b, c) ∈ Yj → (a, c) ∈ M(i, j)

• Here, a, b, and c represent concepts, Xi, Yj, and M(i, j) represent relations
between concepts. If there is an Xi relation between concepts a and b
and a Yj relation between concepts b and c, then a relation M(i, j)
between concepts a and c exists.
• The strength can be obtained from corresponding value from table II i.e
S(i, j).
Example:

(वाहन) shares fuzzy hypernymy relation X2 with (कार).

(कार) shares fuzzy meronymy relation Y4 with (रे तियो).

Using composition, the relation between (वाहन) and (रे तियो) is fuzzy meronymy
(M(2,4)) with a moderate strength.

This shows that (वाहन) and (रे तियो) are moderately related, meaning that a
"वाहन" is a "रे तियो" in a moderate number of cases.

The composition of fuzzy relations can help process sentences that are
challenging for standard WordNet, as it considers indirect connections between
concepts.
To compute the strength of composed fuzzy relations, t-norms are used. These
are functions that take two values in the range [0, 1] and return a value in the
same range. Three t-norms are proposed:
T1(x, y) = max(0, x + y - 1)
T2(x, y) = xy
T3(x, y) = min(x, y)
Depending on the t-norm used, you can compute the strength of the composed
fuzzy relation. For example, given x = 0.8 and y = 0.5, T1 results in 0.3, T2
results in 0.4, and T3 results in 0.5. T1 is pessimistic, T2 is moderate, and T3 is
optimistic. The choice of t-norm can influence the strength of the composed
fuzzy relation.

You might also like