Professional Documents
Culture Documents
https://www.researchgate.net/publication/325411209_Semantic_Similarity-
_A_Review_of_Approaches_and_Metrics
D., Akila. (2018). Semantic Similarity- A Review of Approaches and Metrics. International
Journal of Applied Engineering Research. 9.
introduction:
Semantic similarity computes the conceptual (( الحسيةsimilarity between the words or
terms.
Semantic similarity consists of two types, namely relational similarity) indicates the
relations (and attributional similarity) denotes relations between the words (
X and Y, are attributional similar when the attributes of X are similar to the attributes of Y. Two
pairs, A:B and C:D, are relationally similar when the relations between A and B are similar to the
relations between C and D
Semantic similarity approaches measure the similarity between words based on metrics such
as path length, page count and feature. Metrics based semantic similarity methods can be
categorized into the following:
The edge based method measures the similarity between two terms based on the length connects
the terms and location of the terms in the taxonomy
The various edge based methods are Path Length Approach, Depth Relative Scaling,
Conceptual Similarity, and Normalized Path Length. Further, the path length approach
classified as shortest path length and weighted shortest path length
The distance between two adjacent nodes is equal to the average of each direction edge with
respect to the depth of the nodes. The semantic distance refers the summation of the distance
between two neighboring nodes over all links in a path.
Conceptual Similarity
Used with translated words, It measures the similarity among the semantic representation of
verbs and provides solutions to the lexical selection problems in machine translation.
Moreover, it finds conceptual similarity among pairs of concepts
These methods doesn’t support is-a relation, depends only on the path length
Information Content based Method
Using an is-A relation, It defines the similarity between two concepts through examining the
maximum shared information, to add the ability for edge counting we use probability.
It uses WordNet as the taxonomy and calculates the information content using the Brown corpus.
However, it suffers due to the word ambiguous problem.
Merits: Conceptually quite simple and not sensitive to the problem of varying link
distances
This method measures the similarity between two terms based on their properties or
relationships among the terms in the taxonomy. Common features, among the terms boost
the similarity, and non-common features reduce the similarity of two concepts. It mostly uses
novel model to find all kind of relations.
statistical system that leverages word concurrence from a large unlabeled corpus of texts, LSA
believes that similar meaning words will have the same pieces of text.
Merits: reduced redundancy, demerits: Can’t measure the degree of similarity between
two relations
It is the extension of LSA approach and focuses on term vectors, it requires dimensionality
reduction methods and a measure of semantic association.
A novel approach called Explicit Semantic Analysis (ESA) computes semantic relatedness of
texts with the help of huge knowledge base repositories such as Wikipedia and Open
Directory Project (ODP)
This approach use Lexical Resources such as WordNet, Wikipedia to compute the semantic
similarity.
1)Directed Acyclic Graph (DAG): which uses WordNet and DAG theory
Merits: Combines WordNet and DAG theory, provides better results, Demerits: In
WordNet, words have more sense
The computer programs map the various domain-specific ontologies through different similarity
measures and corpus. Similarity measures used in this technique are Cosine similarity, Jaccard
coefficient, and novel market-based model.
Note: The process of accurate information gathering suffers due to the automatic obtaining of
the web user profile in normal methods ,The novel technique solves this complexity, and its
principal intention is to construct an absolute concept model that discovers ontologies
automatically from data sets. Moreover, it offers a method that captures developing a pattern to
process the discovered ontologies, and it efficiently achieves the intention.
Techniques:
Merits: better performance, Demerits: Can’t apply to the context-based video retrieval.
Considers the relatedness among the words, its main models are:
The VSM defines the relationship between a word pairs using a predefined pattern of vector
frequencies of a large corpus.
Merits: Reduces the number of nodes and computation cost, Demerits: Does not
consider the lexical similarity
Latent Relational Analysis (LRA) is the extension of VSM to overcome the drawbacks in
measuring the semantic relations between two pairs of words. LRA enhances the VSM approach
by adding automatic pattern derived from the corpus, smooth the frequency of the data using
singular value decomposition, and reformulate the word pairs based on the synonym. In
terms of performance, LRA achieves substantial improvement over VSM
Lexical Pattern
combines two distinctive approaches called statistical methods and lexico-syntactic patterns to
extract the semantic relation from text
Merits: Extracts the generic and associative relations between words, Demerits: low
accuracy
Hybrid Approaches:
The hybrid approach combines the semantic, corpus, ontology and relational based approaches.
It provides improvements on simple models like combining lexical taxonomy structure and
corpus statistical information. Thus, simplifying the semantic distance calculation between nodes
in the semantic space, it improves edge based to the better, more detailed node-based approach,
another improved model is the novel hybrid approach.
1) corpus-based approach that combines the lexical taxonomy structure with corpus
statistical information: which depends on occurrence
Merits: use internet thus higher accuracy, Demerits: Data sparseness of the corpus