You are on page 1of 8

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 111

Computation of Semantic Similarity among


Cross Ontological Concepts for Biomedical
Domain
K.Saruladha, Dr.G.Aghila, and A.Bhuvaneswary

Abstract— Based on Amos Tversky psychological contrast model this paper proposes a corpus independent information content
based similarity computation method to assess similarity between biomedical concepts belonging to multiple ontology’s. Ontologies have
been widely used in many domains including database integration, bioinformatics, and the Semantic Web to facilitate the sharing of hetero-
geneous information. Semantic similarity techniques are becoming important components in most intelligent knowledge-based and semantic
information retrieval (SIR) systems. This paper discusses the limitations of existing semantic similarity methods for computing similarity be-
tween concepts of a single ontology and concepts belonging to different ontologies. The proposed approach exploits informativeness of
concepts as a factor for computing the amount of specific and shared features between the concepts. Identifying the Most Common Specific
Abstraction between concepts belonging to different ontologies is a challenge and we proposed a methodology to identify the MCSA by
forming a virtual root which connects the root concepts of the considered ontologies. The proposed idea is tested using MESH and
SNOMED-CT biomedical ontology.

Index Terms — Biomedical domain, Information retrieval, Ontology, Similarity Methods, UMLS.

——————————  ——————————

1 INTRODUCTION
ssessing  semantic  similarity  between  concepts  is  a  COMMONALITY PROPERTY ‐ The similarity between A 
A  main  issue  in  much  research  areas  such  as  Linguis‐
tics,  Cognitive  Science,  Biomedicine,  and  Artificial 
and  B  is  related  to  their  commonality.  The  more  com‐
monality they share, the more similar they are. 
Intelligence. Semantic similarity techniques are becoming   
important  components  in  most  intelligent  knowledge‐ DIFFERENCE PROPERTY ‐ The similarity between A and 
based  and  semantic  information  retrieval  (SIR)  systems  B is related to the differences between them. The less dif‐
[1],  [2].  With  the  growing  access  to  heterogeneous  and  ference they have, the more similar they are. 
independent  data  repositories,  the  differences  in  the   
structure and semantics of the data stored in those reposi‐ IDENTITY  PROPERTY  ‐  The  maximum  similarity  be‐
tories plays a major role in information systems. Semantic  tween A and B is reached when A and B are identical, no 
Similarity  relates  to  computing  the  similarity  between  matter how much commonality they share. 
conceptually  similar  but  not  necessarily  lexically  similar   
terms.  Typically,  semantic  similarity  is  computed  by  SYMMETRIC  PROPERTY  –  The  similarity  between  con‐
mapping terms to ontology and by examining their rela‐ cepts (A, B) is equal to the similarity between the concepts 
tionships  (hyponymy,  hypernomy,  meronymy  and  ho‐ (B, A). 
monym) in that ontology. Semantic similarity approaches   
fall  under  four  different  categories:  ontology  based  ap‐ DEPTH  PROPERTY:  The  distance  between  A  and  B  is 
proach,  Information  content  based  approach,  feature  represented by an edge of the concepts and is influenced 
based  approach  and  hybrid  based  approach.  The  basic  by the depth of the location of the edge in the ontology. 
qualitative  properties  that  a  semantic  similarity  measure   
should  consider  are  commonality,  difference,  identity,  This paper discusses the proposed method to compute 
symmetric and depth property.   semantic  similarity  among  cross  ontological  concepts. 
————————————————
Section  II  discusses  the  classification  of  various  semantic 
 Mrs.K.Saruladha is with the Computer Science Department, Pondicherry similarity  methods  based  on  single  ontology;  Section  III 
Engineering College, Puducherry, Pin 605014, India. discusses  Classification  of  similarity  methods  based  on 
 Dr.K Aghila is with the Computer Science Department, Pondicherry Uni-
versity, Puducherry, Puducherry, Pin 605014, India.
cross ontology.  Section IV discusses the architectural de‐
 Ms.A.Bhuvaneswary is with the Computer Science Department, Pondi- sign and algorithm of the similaity method for cross onto‐
cherry Engineering College, Puducherry, Pin 605014, India. logical concepts in biomedical domain. 
 
 
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 112

2 SEMANTIC SIMILARITY METHODS FOR is  obtained  by  considering  negative  log  likelihood  of  the 
SINGLE ONTOLOGY probability of the concept in a given corpus and is given 
Semantic  similarity  methods  are  broadly  classified  into  by 
single  ontology  similarity  methods  and  cross  ontological  IC ( C )   log p ( c ) (1)
similarity methods. Various approaches could be used to 
find similarity between two similar concepts in ontology.  where c is a concept in the considered ontology and p(c) 
Similarity  methods  for  single  ontology  could  be  broadly  is the probability of encountering c in a given corpus. IC 
classified into four main approaches  value  of  each  concept  is  monotonically  decreasing  as  we 
 Ontology Based Approaches  move  from  the  leaves  of  the  taxonomy  to  its  roots.  The 
 Information  Content  (Corpus)  Based  Ap‐ root  node  of  the  concept  in  the  IS‐A  hierarchy  has  the 
proaches  maximum frequency count, since it includes the frequen‐
 Hybrid Based Approaches  cy  counts  of  every  other  concept  in  the  hierarchy.  This 
 Feature Based Approach  approach adheres to the basic properties such as commo‐
nality, symmetry and difference. 

2.3 Hybrid Based Approach


Hybrid  approach  [12],  [13]  combines  different  informa‐
tion  sources  such  as  Information  content  of  the  concept, 
depth  and  shortest  path  to  assess  the  similarity  or  dis‐
tance  between  concepts.  This  approach  adheres  to  the 
basic properties such as commonality, symmetry and dif‐
ference. 
2.4 Feature Based Approach
In  Feature  based  approach  [3],  [14],  [16]  the  similarity 
considers  the  features  that  are  common  to  two  concepts 
Fig. 1. Semantic Similarity Methods for Single Ontology and  also  the  differentiating  features  specific  to  each.  The 
similarity of a concept C1 to a concept C2 is a function of 
2.1 Ontology Based Approach the features that are common to both C1 and C2, those in 
C1 but not in C2 and those in C2 but not in C1.    According 
Ontology means “Specification of a Conceptualization”. It  to Tversky [3] the similarity function is 
is a description of the concepts and relationships that can   
exist  for  domain.  Ontology  based  approach  [4]  requires  Simtvr ( C1, C 2)  .F ( ( C1)  ( C 2))   . F ( ( C1) /  (C 2))  .F ( (C 2) /  (C1)) (2)
consistent and rich ontologies to asses semantic similarity 
between  two  concepts.  The  ontology  based  approaches  where F is some function that represents a set of features, 
are classified under two categories. Path length approach  and α, β and γ  are parameters that afford for differences 
[5]  computes  similarity  by  counting  the  number  of  in  focus  on  the  different  components. ( ( C1)   ( C 2 ))  
nodes/edges between two concepts in terms of the short‐ represents  the  set  of  features  that  the  two  concepts  have 
est path in the taxonomy. Depth relative approach [6], [7],  in  common. ( ( C1) /  ( C 2 )) and  ( ( C 2 ) /  ( C1))   represents 
[8] takes into account the depth of the taxonomy by calcu‐ the differentiating features specific to each concept. Simi‐
lating the depth from the root to the target concept. It ad‐ larity  is  not  symmetric,  (i.e.  Sim(C1,  C2)  !=  Sim(C2,  C1)) 
heres to the basic properties such as difference and identi‐ This  approach  adheres  to  the  basic  properties  such  as 
ty.  commonality, difference. 
 
2.2 Information Based Approach (Corpus)
3 SEMANTIC SIMILARITY METHODS FOR
Information  theoretic  approaches  [9],  [10],  [11]  usually  CROSS ONTOLOGY
employ the notion of Information Content (IC), which can 
Semantic Similarity among multiple ontologies is classi-
be  considered  as  a  measure  quantifying  the  amount  of 
fied under two categories. 1) Ontology based approach 2)
information a concept expresses in the taxonomy. The IC 
Feature based approach.
values  are  obtained  by  calculating  the  probability  of  oc‐
currence of word to each concept in a given corpus. These 
probabilities  are  cumulative  as  we  go  up  the  taxonomy 
from specific concepts to more abstract ones. The IC value 
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 113

( 2 D1  1 )
PathRate 
( 2 D 2  1 )                                                        (5) 
where  D1  and  D2  represents  depth  of  the  primary  and 
secondary ontology. According to the path feature scale of 
primary ontology, the cross modified path length between 
the  two  concepts  nodes  in  primary  ontology  is  calcalcu‐
lated as given in (6) 
 
Path (C1, C2) = d1 + PathRate × d2 – 1                                 (6)              
Since there may be many bridge nodes between two con‐
Fig. 2. Semantic Similarity Methods for Cross Ontology.
cepts  there  can  be  more  than  one  path  length  i.e.  {pathi} 
and the semantic distance, SemDist, between two concept 
3.1 Path Length Approach for Cross Ontology nodes is given as follows 
The  ontology  based  approach  used  in  the  similarity  me‐  
thod for single ontology is differing from multiple ontolo‐ CSpec i ( C 1, C 2 )  D1 Depth ( LCS ( C 1, Bridge i )                    (7) 
gies by considering one as primary and another as second‐
ary  ontology.  The  semantic  similarity  between  cross  onto‐ SemDist ( C1, C 2 )  log(( pathi 1) *CSpeci )   K               (8) 
logical concepts is measured by joining the common node   
belonging to two ontologies is considered as bridge node.  3.2 Feature Based Approach for Cross Ontology 
According to Al‐Mubaid.et.al method [2], the semantic  According to Rodriguez &  Egenhofer, the semantic simi‐
similarity  between  concepts  in  single  ontology  and  mul‐ larity  is  measured  among  multiple  ontologies  by  consid‐
tiple ontologies are measured by ontology‐structure‐based  ering  three  important  features  1)  matching  process,  2) 
technique  for  the  biomedical  domain  (MeSH).  Al‐ semantic neighborhoods 3) distinguishing features. 
Mubaid.et.al has proposed that semantic similarity can be  In  [16],  each  concept  is  considered  as  an  entity  class. 
measured by  using three different cases: 1) Similarity me‐ The similarity between entity classes is given as 
thod for single primary ontology, 2) Similarity method for   
p q p q p q p q
cross ontology and 3) Similarity method within secondary     S ( a , b )  Ww S w ( a , b )  Wu Su ( a , b )  Wn Sn ( a , b )   (9) 
ontologies.  
where Ww ,Wu ,Wn  are the respective weights of the simi‐
The  semantic  similarity  measure  for  cross  ontology  is 
larity  of  each  component  and  it  value  is  greater  than  0. 
based on three features 
The  functions  Sw,  Su,  and  Sn  are  the  similarity  between 
A common specificity of concepts in the ontology 
synonym  sets,  features,  and  semantic  neighbor‐
Cross modified path length between two concepts 
hoods.The  entity  class  a  belongs  to  ontology  p  and  b 
A local granularity of both ontologies. 
belongs  to  ontology  q.  The  similarity  between  entity 
For  cross‐ontology  semantic  similarity,  the  common 
classes  is  calculated  using  synonym  sets,  features,  and 
specificity  feature  between  two  concepts  C1  and  C2  takes 
semantic neighborhoods and is given by 
into  account  the  depth  of  the  least  common  subsumer 
 
(LCS) of two concepts and the depth of the ontology. 
A B
  S ( a ,b )                (10)          
A  B   ( a , b ) A / B  ( 1   ( a ., b )) B / A
CSpec( C1,C 2) D Depth( LCS ( C1,C 2 ))                    (3)  where α is the function representing the depth of the on‐
The less the CSpec value, the more they have shared in‐ tology and its value ranges from 0 to 1. The function α is 
formation between two concepts. In this case, two concepts  given in (11), (12).  
belong to two different ontologies one identified as prima‐ When depth(C1O1) ≤ depth(C2O2)     
ry  ontology  and  other  with  lesser  number  of  concepts  as  O1 O1 O2
 ( C1, C 2 )  Depth ( C1 ) / depth ( C1 )  depth ( C 2 )           (11)         
secondary ontology.  Using bridge node, the least common 
 When depth(a ) > depth(b )    p q
subsumer  node  of  two  concepts  (C1,  C2)  is  measured  by  O1 O1 O2
considering the LCS of the first node C1 in primary ontol‐  ( C1, C 2 )  1  ( Depth ( C1 ) / Depth ( C1 )  Depth ( C 2 ))            (12) 

ogy and the bridge node,  Word matching (Sw) is determined by contemplates 
  the  set  of  common  words  and  different  words  in  the 
              LCS ( C 1, C 2 )  LCS ( C 1, bridge n )            (4)  synonym  sets  that  denote  the  entity  classes  [14].Feature 
matching  (Su)  applies  a  matching  process  which  classi‐
Thus  the  path  length  is  calculated  by  adding  d1  =  d(C1  , 
fies features into parts (Sp), functions (Sf), and attributes 
bridge)  and  d2  =  d(C2  ,  bridge).  In  order  to  scale  the  path 
(Sa). The feature similarity using word matching is given 
length and CSpec features in the secondary ontology to the 
by  
primary ontology, the path rate is given by 
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 114

S u ( a , b ) W p S p ( a , b ) W f S f ( a , b ) W a S a ( a , b )        (13)               
p q p q p q p q
But  when  we  extend  P&S  [14],  [15]  metric  for  cross  on‐
 for Wp, Wf, Wa ≥ 0. Semantic‐neighborhood matching (Sn)  tologies  the  IC  value  of  a  concept  should  be  computed 
compares entity classes ap and bq of ontologies p and q  based  on  both  the  ontologies.  The  following  principles 
with radius r, respectively. The semantic neighborhoods  were kept in mind for designing the new similarity meas‐
is given by   ure. 
a p n bq  The  proposed  measure  should  be  based  on  hu‐
S (a,b)       (14)  man  psychological  models  as  all  of  the  existing 
a n b  (a ,b ). (a ,a n bq ,r )(1 (a p ,bq ). (a p ,a p m bq ,r )
p q p q p p

semantic  similarity  methods  are  evaluated 


The intersection over semantic neighborhoods is ap‐
against human judgments. 
proximated  by  the  similarity  of  entity  classes  across 
 The proposed method should be based on infor‐
neighborhoods, where S is the semantic similarity of enti‐
mation  content  method  because  most  of  the  IC 
ty classes; n and m are the number of entity classes in the 
based  methods  achieve  highest  correlations 
corresponding semantic neighborhoods [16]. 
against human judgements. 
 The  information  content  calculation  should  be 
TABLE 1 corpus independent as corpus dependent IC cal‐
LIMITATIONS OF EXISTING SIMIALRITY METHODS culations are time‐consuming and require tagged 
APPROACH  MEASURE  LIMITATIONS  corpora. 
ONTOLOGY  Rada et al.[4]  Require consistent ontology.   The  depth  property  of  the  semantic  similarity 
should not be ignored as the more deep the con‐
BASED  Leacockand  Only for specific information 
cept is in a hierarchy the most specific it is. 
Chodorow[8]   source 
The  proposed  method  is  to compute  a  semantic  simi‐
  Resnik[9]  Considers only most specific  larity among cross ontological biomedical concepts using 
    common abstraction and IC  feature and information content based methods. The fea‐
IC‐BASED  Lin[10]  depends on corpora. 
ture matching approach uses common and different cha‐
  Time‐consuming analysis of 
the corpus.  
racteristics  between  concepts  to  compute  semantic  simi‐
J&C[11] 
Considers only IS‐A relations  larity.  This  work  is  motivated  by  the  need  of  new  tools 
HYBRID  Li et al.[12]  Requires parameters to be  that  can  improve  the  retrieval,  integration  and  mapping 
BASED    settled.  of  information.  For  this  work  we  thought  the  UMLS 
OSS[13]  Tuning is required.  framework  [17]  could  be  used  as  it  is  populated  with 
    Considered WordNet ontol‐ many  biomedical  ontologies.  The  proposed  idea  is  to  be 
  Pirró  and  ogy and MeSH ontology 
tested  using  MESH  [18]  and  SNOMED‐CT  [19]  biomedi‐
  Seco[14]   separately.  
Issues related to find cross 
cal ontologies.  
FEATURE   
BASED  Rodriguez  &  ontological similarity is not 
addressed. 
  Egenhofer 
Considers only hypernomy / 
Method [16] 
holonomy relations among 
concepts. 

4 THE PROPOSED SIMILARITY METHOD FOR


CROSS ONTOLOGY
Pirró  [14],[15]  has  mentioned  in  his  work  that  the  pro‐
posed  method  could  be  extended  to  compute  semantic 
similarity  between  concepts  belonging  to  different  ontol‐
ogies if the problem of finding the most common specific 
abstract  concept  is  found.  He  has  also  mentioned  in  his 
work  that  the  method  used  by  Rodriguez  [16]  may  be 
adopted to find the MCSA. The main challenge of extend‐
ing the P&S [14] metric for computation of cross ontology 
similarity is underpinned under the following two issues.  
1) Finding the most specific common abstraction between 
concepts Ci and Cj where Ci belongs to O1 ontology and  Fig. 3. Architecture for Computation of Semantic Similarty of Cross
Ontological Concepts
Cj  belongs  to  O2  ontology,  2)  When  we  consider  P&S 
[14],[15] metric IC value is calculated for single ontology.   
 
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 115

The exisiting approach for computation of semantic si‐ measure  use  Information  content  (IC)  value  to  compute 


milarity among cross ontological concepts proposed by Al‐ similarity between two terms in the ontology. Main draw‐
Mubaid.et.al Method [2] is a path based approach and our  backs in this approach are 
proposed  method  is  information  content  based  approach.   It is corpus dependent 
Based  on  Tversky  formula  the  intersection  of  the  features   Time consuming analysis of the corpus  
quantifies the amount of commonality that exists between   The information content for cross ontology is not 
the compared concepts. This quantification is conceived as  addressed. 
IC (MSCA). Pirro [14] has calculated commonality using IC    Thus the refined formula for information content 
(MSCA)  using  single  ontology  (MeSH).  He  has  proposed  based approach overcomes the drawbacks by considering 
the  ratio  based  formulation  of  Tversky  [3]  model.  But  he  the  corpus  independent  information  content  for  cross 
has  not  taken  into  consideration  the  depth  property.  We  ontological concepts. Seco [15] has proposed a method to 
have  considered  depth  property  (α)  for  computation  of  compute  information  content  which  is  corpus  indepen‐
semantic  similarity  of  concepts  belonging  to  different  on‐ dent  The  information  content  based  approaches  have 
tologies (SNOMED‐CT and MeSH).  been  refined  for  computing  cross  ontological  concept  si‐
  milarity. All of these approaches calculate the information 
4.1 Proposed Similarity Method content value using Seco’s [15] formula. 
The  similarity  value  can  be  calculated  by  assuming  a  vir‐  
tual root that connects the subcategory of both the biomed‐  Refined Resnik’s Measure  
ical  ontologies  of  the  concepts.  This  measure  is  computed  Semantic Similarity between concepts (C1, C2) belonging 
based  on  information  content,  shared  features  and  depth 
to  two  different  ontologies  (O1  and  O2)  is  given  by  (18). 
of the taxonomy.The Similarity measure is defined by
Information content for the concept IC(C) is given by (17). 
 
IC ( MSCA ( c ))
The refined Resnik [9] measure is given by 
Sim ( C 1 ,C 2 ) 
IC ( MSCA ( C ))  ( C 1 ,C 2 ).( IC ( C 1 )  ( 1 ( C 1 ,C 2 ).( IC ( C 2 )) (15)               
 
where  IC ( MSCA( c ))  is a most specific common abstraction of  Simres  max CS ( O1( C1),O 2( C 2 )) IC (C )]        (18) 
both  the  concepts.  It  can  be  calculated  by  considering  a 
where max CS (O1(C1),O 2 (C 2)) represents the ancestor con-
virtual  root  that  connects  both  the  subcategory  of  two 
different  ontologies.  From  the  virtual  root,  the  synonym  cept which is having maximum informantion content
among two ontologies O1 and O2.
set of the hypernym concepts of primary ontology is being 
matched  with  the  synonym  set  of  the  hypernym  concepts 
 Refined Jiang & Conrath’s Measure 
of  secondary  ontology  using  word  matching  feature  and 
The refined semantic distance between any two concepts 
MSCA can be calculated from the matched set.  
C1 and C2 belonging to two different ontology is given by 
 
 
log(min( hypo ( O1( C 1), O 2 ( C 2 ))) 1)
IC ( MSCA ( c ))  1                  (16)  IC ( C1)  IC ( C 2 )  2 MaxCS ( O1( C1),O 2 ( C 2 )) ( IC ( C ))
log(max min con )
Sim J &C ( C1, C 2 )  (19) 
2
where  function  min(hypo(O1(C1),O2(C2))  represents  the 
The  information  content  value  should  consider  the 
taxonomy  which  is  having  minimum  hyponymy  of  the 
specific  concept  belonging  to  ontology  O1  and  ontology 
concept.  It  also  considers  depth  (α)  the  hierarchy  of  both 
O2 and also IC value  of the concept that  maximally sub‐
the  concepts  and  it  can  be  calculated  by  (11),  (12).  IC(C1) 
sumes both the concepts. 
and  IC(C2)  are  the  specific  information  content  value  for 
 
each  concept  in  their  corresponding  hierarchy.  The  IC 
 Refined Lin’s Measure 
value is defined [14] as  
1log( hypo ( c )1)
The  refined  Lin  similarity  method  for  cross  ontological 
IC ( c )            (17)  concepts (C1, C2 ) is given by 
log(maxcon )

where  the  function  hypo  represents  the  number  of  hypo‐  


MSCA ( O1( C1), O 2 ( C 2 ))
nyms  of  a  given  concept  c  and  max con   represents  total  Sim Lin ( C1, C 2 )  2*                       (20) 
IC ( C1)  IC ( C 2 )
number of concepts in the considered taxonomy. If a con‐ Lin’s  measure  is  therefore  the  ratio  of  the  informa‐
cept has many hyponyms, then it has more of a chance of  tion  shared  in  common  (i.e. MSCA ( O1( C1), O 2 ( C 2 )) )    to  the 
appearing  in  the  taxonomy  hence  it  convey  less  informa‐ total amount of information possessed by two concepts in 
tion content compared to the concepts that are leaves.  two different ontologies. 
   
4.2 Refined Information Content Approaches 4.3 Proposed Algorithm
The  semantic  similarity  measures  such  as  Resnik’s  [9]  Let  (O1,  O2,….,  On)  are  multiple  ontologies  available  in 
measure,  Lin’s  [10]  measure  and  Jiang&Cornath’s  [11]  UMLS framework. Among the available ontologies desig‐
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 116

nate one ontology as primary and the other as secondary   
Step 5: Calculate Information content for Specific
based on the granularity of the concepts they possess and 
  concepts i.e IC (C1) and IC (C2)
then identify the concepts for which the semantic similari‐
                        IC ( c )  1log( hypo ( c )1)
ty is to be calculated. Let O1(Ci) and O2(Cj) be the concepts  log(maxcon )
 
belonging  to  the  corresponding  ontologies  and  r1  and  r2  go to Step 7
 
be the root nodes of the selected ontologies. Create a vir‐
 
tual root (VR) which connects the root nodes r1 and r2 to  Step 6: Calculate the depth of the concepts in both
 
VR. For our experiments we have considered the datasets  ontology by (12),(13)
(36  concept  pairs)  used  by  [2]  &  [14].  For  the  biomedical  go to Step 7
concepts  of  the  datasets  XML  files  are  generate  using 
Clinclue  and  dragon  toolkit.  XML  input  file  contains  Step 7: Calculate semantic similarity between the
concept pair (C1,C2) by
hypernomy  and  hyponymy  relations  of  each  concept.  It 
also contains depth and synonym set of each concept. The  IC ( MSCA ( c ))
Sim ( C 1 ,C 2 ) 
created  XML  files  of  the  biomedical  concepts  serve  as  IC ( MSCA ( C ))  ( C 1 ,C 2 ).( IC ( C 1 )  ( 1 ( C 1 ,C 2 ).( IC ( C 2 ))

output  to  the  algorithm  and  the  semantic  similarity  is  go to Step 8
calculated. 
  Step 8: Calculate semantic similarity for cross ontol-
SS_Score Algorithm (Cross Sim(XML file1,XML
file2)) ogy using refined Information Content Approaches
(Resnik using (19), J&C using (20), Lin using (21).
// SS_Score represents Semantic Similarity Score.
Step 9: Collect human judgements for which similar-
Step1: Get the input XML file for the concepts from
the repository. ity rating is to be calculated.

Step 10: Check User Integrity by a rating coefficient


Step2: Compute ancestor list and corresponding hy-
(i.e., Rc) defined as
po number for each concept C1 and C2 //hypo-
n
RC   Ci  avgi
number of hyponyms
i 0
While concept C1 not found in ontology O1
where n represents number of concept pairs.
{Return (ancestor list (S1) found for the con-
Eliminates human judgement which are incorrect
cept C1 and their corresponding hyponym numbers
using Rc
in the Ontology O1)}
While concept C2 not found in ontology O2
Step 11: Calculate correlation coefficient using Pear-
{Return (ancestor list (S2) found for the con-
son correlation coefficient.
cept C2 and their corresponding hyponym number in
Step 12: Compare the performance of the proposed
the Ontology O2)}
approach.
Step 13: End.
Step 3: Compare (S1 and S2) until common ancestor is
found 4.4 Sample Computation among Biomedioal Con-
If one or more common ancestor found, create cepts
list of common ancestor.
mscalist = conceptlist [(c1,h1),(c2,h2), (c3,h3)..(cn,
hn)]
go to Step 4
else
{Return (“There is no most specific common an-
cestor for the concept pair (C1,C2). The similarity
value cannot be calculated”)}

Step 4: Calculate Most Specific Common Abstraction


of both concepts (MSCA(C1,C2))
//From the mscalist, the concept which is having
higher level in the taxonomy is considered as the
MSCA (C1,C2) and information content of msca con-
cept is calculated by
log(min( hypo ( O1( C 1), O 2 ( C 2 ))) 1) Fig. 4. Connecting two ontology fragments
IC ( MSCA ( c ))  1 
log(max min con )
go to Step 7  
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 117

The  similarity  between  concepts  a4  and  b3  belongs  to  plication of a metric on semantic nets”, IEEE Trans. on Systems,
two different ontologies is measured by connecting sub‐ Man, and Cybernetics vol. 19, pp. 17–30, 1989.
roots (a1 and b1) of the concepts to the virtual root (VR).  
[5] G. Hirst, D. St-Onge, WordNet, “An Electronic Lexical Data-
The  common  ancestor  that  exists  between  a4  and  b3 
base, Chapter Lexical Chains as Representations of Context for
among  different  ontologies  are  a3  in  O1  and  b3  in  O2. 
the Detection and Correction of Malapropisms”, MIT Press,
Thus MSCA (a4, b3) is calculated by choosing the ontol‐ 1998.
ogy which is having minimum number of hyponymy of 
MSCA concept and the Information content value can be  [6] Wu and M. Palmer, “Verb semantics and lexical selection,”
calculated using (16). Information Content for the specif‐ Proc. 32nd Ann. Meeting Assoc. Comput. Linguistics, pp. 133–138,
ic  concept  is  measured  by  using  (17).    Depth  of  msca  1994.
concept from the virtual root is calculated using the for‐
mula (11) & (12). Thus the similarity value among cross  [7] Michael Sussna,”Word sense disambiguation for free-text in-
dexing using a massive semantic network”, Proc. Second Interna-
ontological concepts is calculated using (15). 
tional Conference on Information and Knowledge Management, pp.
TABLE 2 67–74, 1993.
SIMILARITY RATING FOR BIOMEDICAL CONCEPTS
Concept 1  Concept 2  Similarity  [8] Claudia Leacock and Martin Chodorow”Combining local con-
text and Word-Net similarity for word sense identification”, In
rating 
Christiane Fellbaum, editor, WordNet: An Electronic Lexical Data-
Anemia  Appendicitis  0  base, pp. 265–283. 1998.
[9] P. Resnik, “Information content to evaluate semantic similarity
in taxonomy”, Proc. of IJCAI, pp. 448–453, 1995.
Antibiotics  Antibacterial  0.736 
agent  [10] D. Lin, “An information-theoretic definition of similarity”, in
Urinary tract  Pyelonephritis  0.373  Proc. of Conference on Machine Learning, pp. 296–304, 1998.

infection  [11] J. Jiang, D. Conrath, ”Semantic similarity based on corpus statis-


Migraine  Headache  0.433  tics and lexical taxonomy”, Proc. of ROCLING X, 1997.

[12] Y. Li, D. McLean, Z. Bandar, J. O’Shea, K. Crockett, “Sentence


similarity based on semantic nets and corpus statistics”, IEEE
Trans. on Knowledge and Data Engineering, vol. 18,no. 8,pp. 1138–
5 CONCLUSION 1150, 2006.

This  paper  has  discussed  the  various  semantic  similarity 


[13] V.S.Zuber, B. Faltings, “OSS: A semantic similarity function
approaches  that  could  be  used  for  finding  similar  con‐ based on hierarchical ontologies”, Proc. of IJCAI, pp. 551–556,
cepts of a single ontology and concepts belonging to dif‐ 2007.
ferent ontologies. It also describes a new semantic similar‐
[14] G. Pirró, N. Seco, “A new semantic similarity metric combining
ity  computation  method  between  biomedical  concepts  features and intrinsic information content”, in ODBASE, pp.
belonging  to  multiple  ontologies  based  on  corpus  inde‐ 1271–1288, 2009.
pendent  information  content  and  also  investigating  how 
[15] N. Seco, T. Veale, J. Hayes, “An intrinsic information content
this  measure  influence  retrieval  effectiveness  in  informa‐ metric for semantic similarity in WordNet”, in Proc. of ECAI, pp.
tion retrieval applications and study the influence of rela‐ 1089–1090, 2004.
tions in computation of semantic similarity score. 
[16] M. Rodriguez, M. Egenhofer, “Determining semantic similarity
among entity classes from different ontologies”, IEEE Trans. on
REFERENCES Knowledge and Data Engineering vol. 15, no. 2,pp. 442–456, 2003.
[1] A.Budanitsky and G. Hirst, “Evaluating WordNet-based meas-
ures of semantic distance,” Comput. Linguistics, vol. 32, no. 1, [17] UMLS: (2010). [Online].Available: 
pp. 13–47, 2006. Http://www.nlm.nih.gov/research/umls/

[18] MeSH Browser (2010). Available: 
[2] H. A. Nguyen and H. Al-Mubaid, “Measuring Semantic Simi-
http://www.nlm.nih.gov/mesh/MBrowser.html 
larity Between Biomedical Concepts Within Multiple Ontolo-
 
gies,” IEEE Trans. on Systems, Man, and Cybernetics,vol.39,no.4, [19] SNOMED‐CT (2010). Available: 
pp. 339–398, 2009. http://www.snomed.org/index.html 
 
[3] A.Tversky, “Features of similarity, Psychological Review” vol. [20] Angelos  Hliaoutakis,  “Semantic  Similarity  Measure  in  MeSH 
84 no. 2, pp. 327– 352, 1977. Ontology and their application to Information Retrieval on Med‐
line”, 2005 
[4] Rada, H. Mili, M. Bicknell, E. Blettner, “Development and ap-  
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 118

[21] Giuseppe Pirro and Jerome Euzenat, “A Feature and Information 
Theoretic  Framework  for  Semantic  Similarity  and  Related‐
ness”,2010 

K.Saruladha working as Assistant professor in Pondicherry Engi-


neering College, India. She has got a total of 20 years of teaching
expereince. She has graduated from Pondicherry universi-
ty.Puchucherry, India. She is a member of Indan Society of Technic-
al Education, India. She has published nearly 20 research papers in
Distributed computing, information security and ontology based in-
formation retrieval. She is currently pursuing her Ph.D. in ontology
based inforemation retreival systems.

Dr.G.Aghila working as professor in Pondicherry Universit, India has


got a total of 20 years of teaching expereince. She has graduated
from Anna University chennai, India. She has published nearly 40
research papers in web crawlers, ontology based information re-
trieval. She is currently a supervisor guiding 8 Ph.D. scholars sys-
tems.She was in receipt of schrneiger award. She is an expert in
onology development. Her area of interests inlcude artificial intelli-
gence, text mining and semantic web technologies.

A.Bhuvaneswary is a post graduate student pursuing her M.Tech in


distributed computing systems.

You might also like