Summary

Summary for:
https://www.researchgate.net/publication/325411209_Semantic_Similarity-
_A_Review_of_Approaches_and_Metrics
D., Akila. (2018). Semantic Similarity- A Review of Approaches and Metrics. International
Journal of Applied Engineering Research. 9.
introduction:
Semantic similarity computes the conceptual (‫( الحسية‬similarity between the words or
terms.
Semantic similarity consists of two types, namely relational similarity) indicates the
relations (and attributional similarity) denotes relations between the words (
X and Y, are attributional similar when the attributes of X are similar to the attributes of Y. Two
pairs, A:B and C:D, are relationally similar when the relations between A and B are similar to the
relations between C and D
Classification of Semantic Similarity Approaches

Semantic similarity approaches are divided into following five categories.
 Metrics based semantic similarity approaches
 Corpus based approaches
 Ontology based Approaches
 Relational based Approaches
 Hybrid based Approaches

 Metrics based semantic similarity approaches
Semantic similarity approaches measure the similarity between words based on metrics such
as path length, page count and feature. Metrics based semantic similarity methods can be
categorized into the following:
 Edge based method
The edge based method measures the similarity between two terms based on the length connects
the terms and location of the terms in the taxonomy
The various edge based methods are Path Length Approach, Depth Relative Scaling,
Conceptual Similarity, and Normalized Path Length. Further, the path length approach
classified as shortest path length and weighted shortest path length
Path Length Approach:
Most basic, divided into:
 Shortest Path Length: (Shortest Path Length presents a straightforward method to

measure semantic similarity). Merits: simple ,demerits: link variance problem
 Weighted Shortest Path Length: (Method assigns the weight to every edge and
calculates the semantic similarity based on the weight.), better than shortest
Depth Relative Scaling
The distance between two adjacent nodes is equal to the average of each direction edge with
respect to the depth of the nodes. The semantic distance refers the summation of the distance
between two neighboring nodes over all links in a path.
Conceptual Similarity
Used with translated words, It measures the similarity among the semantic representation of
verbs and provides solutions to the lexical selection problems in machine translation.
Moreover, it finds conceptual similarity among pairs of concepts
These methods doesn’t support is-a relation, depends only on the path length
 Information Content based Method
Using an is-A relation, It defines the similarity between two concepts through examining the
maximum shared information, to add the ability for edge counting we use probability.
It uses WordNet as the taxonomy and calculates the information content using the Brown corpus.
However, it suffers due to the word ambiguous problem.
Page count is considered a factor sometimes
Merits: Conceptually quite simple and not sensitive to the problem of varying link
distances
, demerits: Suffers due to inappropriate word senses
 Feature based method
This method measures the similarity between two terms based on their properties or
relationships among the terms in the taxonomy. Common features, among the terms boost
the similarity, and non-common features reduce the similarity of two concepts. It mostly uses
novel model to find all kind of relations.
Merits: takes concepts feature into consideration, demerits: Computational complexity

and can’t work well if there is not a complete features set
 Corpus based approaches

find the similarity between terms based on the corpus (a collection of written texts, especially the
entire works of a particular author or a body of writing on a particular subject) , the approaches
used are:
 Latent Semantic Analysis (LSA)
statistical system that leverages word concurrence from a large unlabeled corpus of texts, LSA
believes that similar meaning words will have the same pieces of text.
Merits: reduced redundancy, demerits: Can’t measure the degree of similarity between
two relations
 Generalized Latent Semantic Analysis (GLSA)
It is the extension of LSA approach and focuses on term vectors, it requires dimensionality
reduction methods and a measure of semantic association.
Merits: Efficiently capture semantic relations between terms, demerits: noise
 Explicit Semantic Analysis (ESA)
A novel approach called Explicit Semantic Analysis (ESA) computes semantic relatedness of
texts with the help of huge knowledge base repositories such as Wikipedia and Open
Directory Project (ODP)
Merits: can use wiki, demerits: is not link structure
There are more techniques we can use with corpus approach
 Ontology based Approaches

Ontology determines concept pairs, using a Resource Description Framework, it has a wide
range of algorithms and approaches like:
 Lexical Resource based Approaches:
This approach use Lexical Resources such as WordNet, Wikipedia to compute the semantic
similarity.
Techniques we can use:
1)Directed Acyclic Graph (DAG): which uses WordNet and DAG theory
Merits: Combines WordNet and DAG theory, provides better results, Demerits: In
WordNet, words have more sense
2)Spreading activation strategy, which uses wiki

Merits: Wikipedia, increase accuracy, Demerits: hard to compute
 Ontology Based Approach
The computer programs map the various domain-specific ontologies through different similarity
measures and corpus. Similarity measures used in this technique are Cosine similarity, Jaccard
coefficient, and novel market-based model.
Note: The process of accurate information gathering suffers due to the automatic obtaining of
the web user profile in normal methods ,The novel technique solves this complexity, and its
principal intention is to construct an absolute concept model that discovers ontologies
automatically from data sets. Moreover, it offers a method that captures developing a pattern to
process the discovered ontologies, and it efficiently achieves the intention.
Techniques:
1)Compact Concept Ontology (CCO), which uses wordnet
Merits: better performance, Demerits: Can’t apply to the context-based video retrieval.
2)new ontology-based measure or new feature-based measure: which uses Taxonomical

features
Merits: simple, Demerits: slow performance.
3) Ontology mining: which uses Association set
Merits: Effectively uses discovered knowledge, Demerits: Difficult writes adequate

Descriptions and narratives.
4) Market basket model: which uses Document corpus
Merits: Prediction error reduced using corpus structural information, Demerits:

Scalable problem and inefficient.
 Relational based Approaches
Considers the relatedness among the words, its main models are:
Vector Space Model (VSM):
The VSM defines the relationship between a word pairs using a predefined pattern of vector
frequencies of a large corpus.
Merits: Reduces the number of nodes and computation cost, Demerits: Does not
consider the lexical similarity
Latent Relational Analysis:
Latent Relational Analysis (LRA) is the extension of VSM to overcome the drawbacks in
measuring the semantic relations between two pairs of words. LRA enhances the VSM approach
by adding automatic pattern derived from the corpus, smooth the frequency of the data using
singular value decomposition, and reformulate the word pairs based on the synonym. In
terms of performance, LRA achieves substantial improvement over VSM
Merits: Can apply to many applications, Demerits: high error rate
Lexical Pattern
combines two distinctive approaches called statistical methods and lexico-syntactic patterns to
extract the semantic relation from text
Merits: Extracts the generic and associative relations between words, Demerits: low
accuracy
 Hybrid Approaches:
The hybrid approach combines the semantic, corpus, ontology and relational based approaches.
It provides improvements on simple models like combining lexical taxonomy structure and
corpus statistical information. Thus, simplifying the semantic distance calculation between nodes
in the semantic space, it improves edge based to the better, more detailed node-based approach,
another improved model is the novel hybrid approach.
Techniques we can use:
1) corpus-based approach that combines the lexical taxonomy structure with corpus
statistical information: which depends on occurrence
Merits: very high correlation value, Demerits: consumes a lot of resources
2) hyprid approach with wordnet and internet as a corpus
Merits: use internet thus higher accuracy, Demerits: Data sparseness of the corpus
Check table 2 for more methods to research. Its very good

Summary

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Summary

Uploaded by

Copyright:

Available Formats

Summary for:

Classification of Semantic Similarity Approaches

 Metrics based semantic similarity approaches

 Corpus based approaches

 Ontology based Approaches

 Relational based Approaches

 Hybrid based Approaches

 Edge based method

Path Length Approach:

Most basic, divided into:

 Shortest Path Length: (Shortest Path Length presents a straightforward method to

Depth Relative Scaling

Page count is considered a factor sometimes

, demerits: Suffers due to inappropriate word senses

 Feature based method

Merits: takes concepts feature into consideration, demerits: Computational complexity

 Corpus based approaches

 Latent Semantic Analysis (LSA)

 Generalized Latent Semantic Analysis (GLSA)

Merits: Efficiently capture semantic relations between terms, demerits: noise

 Explicit Semantic Analysis (ESA)

Merits: can use wiki, demerits: is not link structure

There are more techniques we can use with corpus approach

 Ontology based Approaches

 Lexical Resource based Approaches:

Techniques we can use:

2)Spreading activation strategy, which uses wiki

 Ontology Based Approach

1)Compact Concept Ontology (CCO), which uses wordnet

2)new ontology-based measure or new feature-based measure: which uses Taxonomical

Merits: simple, Demerits: slow performance.

3) Ontology mining: which uses Association set

Merits: Effectively uses discovered knowledge, Demerits: Difficult writes adequate

4) Market basket model: which uses Document corpus

Merits: Prediction error reduced using corpus structural information, Demerits:

Vector Space Model (VSM):

Latent Relational Analysis:

Merits: Can apply to many applications, Demerits: high error rate

Techniques we can use:

Merits: very high correlation value, Demerits: consumes a lot of resources

2) hyprid approach with wordnet and internet as a corpus

Check table 2 for more methods to research. Its very good

You might also like