You are on page 1of 3

Taxonomy Matching Using Background Knowledge

Linked Data, Semantic Web and Heterogeneous Repositories

Heiko Angermann, Naeem Ramzan

Taxonomies, also called directories, and in e-commerce named e-catalogs, are subcategories of
ontologies, which are using hierarchically ordered concepts to model a field of interest in a
formal way. This hierarchical representation of a domain has its merits for navigation and for
exploring similar items. For example, to categorize customers according to their accompanying
branch inside a Customer Relationship Management (CRM) system, to categorize goods
according to categories inside a Product Information Management (PIM) system, to classify
assets inside a Media Asset Management (MAM) system, in E-Commerce systems to help
customers finding the desired products, or in Enterprise Resource Planning (ERP) systems to
structure product master and lifecycle data.
p. 4

However, as most of the enterprises are using their own taxonomy to model over hundreds of
interrelated concepts, a manual comparison between two data repositories would be a time-
intensive and error-prone task. To (semi)-automatically detect matches between two taxonomies,
a broad research community is treating the paradigm of Taxonomy Matching and Ontology
Matching.
p. 4

According to the literature, four types of heterogeneity exist, whereby one or multiple types of
heterogeneity can exist between two taxonomies: terminological heterogeneity (different
labels/languages), conceptual heterogeneity (contradictory structures), syntactical
heterogeneity (varying data models), and semiotic heterogeneity (disparate cognitive
interpretations).
p. 4

Contrary to ontologies, a taxonomy is only describing hierarchical relationships (hypernym,


hyponym), but not arbitrary complex relationships (meronyms, antonyms, synonyms).
p. 5
To construct taxonomies, the authors in [87] defined three tasks:

1. Building the Taxonomy. Either through a bottom-up approach, i.e., combination of sub
concepts, or with a top-down approach, i.e., splitting of super concepts.

2. Grouping the Concepts. The grouping of sibling concepts is predominantly achieved by


referring to background knowledge resource(s). For example to a standard taxonomy, or the
semantic lexicon WordNet, which is the most widely used resource [58].

3. Assigning the Super Concepts. The last task aims to determine super concepts for the grouped
sub concepts, i.e., the is-a relationships.
p. 6

During the last decade, taxonomies have arisen to be an essential part in various information
management applications. For example, taxonomies are used on e-commerce sites and are
implemented in all recent e-commerce applications as e-catalogs, as well as in the included
Product Information Management (PIM) components for managing products, and in the included
Customer Relationship Management (CRM) components for managing customers [8, 133]. The
e-catalogs, along with the concepts, in e-commerce named categories, are utilized for supporting
the user when navigating through the marketplace, to make selections in it, and to place orders;
see Fig. 1.2. The labels of e-catalogs mainly consist of a combination of different word
sequences, i.e., Multi Word Expression (MWE), which differ according to the used language (see
Fig. 1.2a and b) and according to the marketplace (seeFig. 1.2a and c).
p. 6

• Terminological Heterogeneity appears when the labels of concepts are different[54]. Either
because of different languages are in use, e.g., T A in English, and T in German, when using
different technical sublanguages, or when using synonyms.

• Conceptual Heterogeneity arises if two taxonomies are using different models [55]. The
taxonomies are representing the domain with different axioms, i.e., true statements, or different
concepts [54]. For example, the concept “Auto Detailing & Car Care” has five sub concepts,
whereas the concept “Auto & Motorrad” hasonly four sub concepts, but some of the concepts are
semantically similar.

• Syntactical Heterogeneity occurs when different data languages/models are used to store the
taxonomies [54]. For example, T A is stored in OWL, but T is stored in RDF. As OWL has a
precise mapping to RDF, thus is comparable with arbitrary RDF graphs. If languages are using a
different knowledge representation formalism, matching approaches are required to translate
between the schemata.

• Semiotic Heterogeneity emerges when persons misinterpret concepts, respectively, the is-a
relationships between the concepts. For example, a customer does not expect “Pressure Workers”
to detail “Auto Detailing & Car Care”, as this tool can also be used for cleaning houses and
gardens. B B
p. 9

Taxonomic heterogeneity is the cognitive and methodical disparity between two taxonomies
describing the same domain of interest. The heterogeneity itself is categorized into four subjects
according to existing literature [157]: terminological heterogeneity (different languages, and
language usage), conceptual heterogeneity (contradictory structures, and levels), syntactical
heterogeneity (varying data models), and semiotic heterogeneity (disparate cognitive
interpretations).
p. 15

You might also like