You are on page 1of 11

SOCNET

DOI 10.1007/s13278-010-0008-2

ORIGINAL ARTICLE

Visual knowledge representation of conceptual semantic networks


Leyla Zhuhadar • Olfa Nasraoui • Robert Wyatt •

Rong Yang

Received: 23 March 2010 / Accepted: 28 July 2010


 Springer-Verlag 2010

Abstract This article presents methods of using visual webpages, PHP webpages, PDF, PowerPoint)] platform.
analysis to visually represent large amounts of massive, Also, we argue that the most important factor in building
dynamic, ambiguous data allocated in a repository of the semantic representation is defining the hierarchical
learning objects. These methods are based on the semantic structure and the relationships among concepts and sub-
representation of these resources. We use a graphical concepts. In addition, we investigate the association
model represented as a semantic graph. The formalization between concepts using Concept Analysis to generate a
of the semantic graph has been intuitively built to solve a lattice graph. Our domain is considered as a graph, which
real problem which is browsing and searching for lectures represents the integrated ontology of the HyperManyMedia
in a vast repository of colleges/courses located at Western platform. This approach has been implemented and used by
Kentucky University (http://HyperManyMedia.wku.edu). online students at WKU (http://www.wku.edu).
This study combines Formal Concept Analysis (FCA) with
Semantic Factoring to decompose complex, vast concepts
into their primitives in order to develop knowledge repre- 1 Introduction
sentation for the HyperManyMedia [we proposed this term
to refer to any educational material on the web (hyper) in a This study combines Formal Concept Analysis (FCA)
format that could be a multimedia format (image, audio, withSemantic Factoring to construct and develop a multi-
video, podcast, vodcast) or a text format (HTML lingual ontology. In a nutshell, it answers the following
question: ‘‘How is it possible to visualize an ontology
graph which represents knowledge and reasoning of a
L. Zhuhadar (&)  O. Nasraoui massive, ambiguous, and vast set of documents using
Knowledge Discovery and Web Mining Lab, minimum vocabulary?’’ The model is built upon a variety
Department of Computer Engineering and Computer Science, of principles that we adopt. First, we use Zipf’s law: ‘‘The
University of Louisville, Louisville, KY 40292, USA Principle of Least Effort (Zipf 1972)’’, Zipf found a
e-mail: leyla.zhuhadar@wku.edu
clearcut correlation between the number of words and the
O. Nasraoui
frequency of their usage, it is presented as rf = c, where
e-mail: olfa.nasraoui@louisville.edu
r is the word’s rank in a document and f its frequency of
R. Wyatt occurrence. We rely on this significant finding by mini-
The Office of Distance Learning, mizing the amount of effort we put to create the user
Division of Extended Learning and Outreach,
ontology. The most frequent vocabulary that represent the
Western Kentucky University,
Bowling Green, KY 42101, USA corpus in our domain (E-learning) is used. In this sense, we
e-mail: robert.wyatt@wku.edu observe the most frequent keywords searched by users, this
information is obtained from the users’ logs. Our
R. Yang
assumption is the following: ‘‘if we capture the most fre-
Department of Mathematics and Computer Science,
Western Kentucky University, Bowling Green, KY 42101, USA quent words used by an online user, then adding these
e-mail: rong.yang@wku.edu words to the user ontology, the information retrieval model

123
L. Zhuhadar et al.

would provide the user with the most relevant documents query on the visual search engine, the visual search engine
in both languages. Second, we use the concept of ‘‘Collo- dynamically matches the query with the whole visual
cation’’, which proved to be important in areas, such as ontology (concepts, subconcepts, etc.). The visual search
machine translation and information retrieval (Manning engine presents all the sectors (concepts/subconcepts) that
and Schütze 1999). Manning and Schütze (1999) divided share the typed letters using different colors than the
the ‘‘Collocation Concept’’ into three categories: (1) com- unmatched concepts. Therefore, the user can find what he/
pounds, such as ‘‘semantic web’’, (2) phrasal verbs, such as she is looking for immediately. As the user adds more
‘‘turn on’’, and (3) stock phrases, such as ‘‘Introduction to letters to his/her query, the number of matched sectors
Literature’’. The third type is what we used in constructing narrows down to the most similar concepts in the ontology.
our ontology. Since our users (students) spent 80% of their The primary contribution to the State of the Art made in
time searching for topics related to the following catego- this research is in the reuse of the domain ontology to build
ries: (1) course name, (2) lecture name, and (3) professor visual search facets, where the hierarchic ontology struc-
name. Therefore, constructing an ontology that consists of ture was converted into a lattice (graph) and presented as
collocations (e.g., ‘‘Game Theory for Managers’’) would nodes and edges, where the final representation of the
increase the precision. Third, we used personalization to graph is provided to users as sectors and subsectors.
decrease the ambiguity of semantic search. Each user The rest of this paper is divided into the following
activity on to the system defines his/her area of interest sections:
(college/courses), therefore, a unique ontology is generated
Section 2 (Background and related work): We give an
for each user. As a consequence, the search terms used by a
overview of visual analytics, applications, and related
user are governed by his domain of interest (e.g., if a user is
work.
searching for the keyword ‘‘History’’, if he is enrolled in
Section 3 (Methodology): This section presents the
Mathematics, the system should retrieve course ‘‘History of
semantic domain structure and the representation of the
Mathematics’’, but if he is enrolled in the college of His-
semantic domain.
tory, the same keyword search will retrieve the course
Section 4 (Implementation): This section presents the
‘‘History of Civilization’’). The effectiveness of our model
process of building the HyperManyMedia ontology, then
comes from the synergy between all the previous
adding the ontology to the search engine. It ends with
principles.
designing a visual ontology search engine.
Before diving into the theory and the methodology of
Section 5 (Evaluation): In this section, we test the
implementing the system, let us begin with some descrip-
usability of the visual search engine.
tive definition of the system: HyperManyMedia is an
Section 6 (Conclusion): In this section, we present the
information retrieval system that utilizes an ontology based
novelty of our research and our contribution.
model and provides semantic information. This approach
uses two different types of ontologies, a global ontology
model that represents the whole E-learning domain (con-
tent-based ontology), and a learner-based ontology model 2 Background and related work
that represents the learner’s profile. The implementation of
the ontology model is separate from the design of the In the section, we introduce the definition of visual ana-
information retrieval system. The architecture of the lytics, then we define several visual applications, and
HyperManyMedia system can provide, manage, and collect finally discuss related work and the significance of our
data that permits high levels of adaptability and relevance visual analytic methods and techniques.
to the learner’s profile. To achieve this objective, an
approach for personalized search is implemented that takes 2.1 Visual analytics
advantage of the Semantic Web standards (RDF and OWL)
to represent the content and the user profiles. Everyday, data is produced with unprecedented rates in
The main focus of this paper is the visual representation variety of fields, examples include scientific data, internet
of the ontology that allows learners to navigate the system information, data management systems, business and
visually. The main objective of this research was to provide marketing data, etc. Visual analytics is the bridge between
the user (learner) with a visual search engine to summarize the human eyes and the machine, it facilitates the process
the entire domain (E-learning). This can be considered as a of: (a) discovering hidden knowledge, (b) summarizing
tool to help visualize concepts and subconcepts. This visual data, (c) representing data in a manner that the human
exploration of documents enables users to have an overall cognitive system can perceive, (d) helping users find nee-
view of the entire repository, without even clicking on the ded information as fast as possible, or (e) allowing users to
resources and reading each document. When a user types a interact with huge amounts of data easily and efficiently.

123
Visual knowledge representation

Thomas and Cook define visual analytics as ‘‘the science Dali et al. 2009; Rusu et al. 2009a, 2009b; Zhuhadar
of analytical reasoning facilitated by interactive visual et al. 2009), or exploring data in folksonomy systems
interfaces (Thomas and Cook 2005)’’. Visual analytics based on a hierarchical semantic representation,
differs from other analytics applications by its capability to ‘‘semantic cloud or tags’’ (Bizer et al. 2009;
simplify complex data to provide users with quick, focused Heymann et al. 2008; Kim et al. 2008; Kruk et al.
representations where users can interact with data, find the 2005, 2007; Rusu et al. 2009a, 2009b; Stan and Maret
important features they are looking for, and translate the 2009; Szomszor et al. 2007)
data into a visual aspect that their cognitive reasoning
process can decipher in a fast manner (Thomas and Cook
2.3 Related work
2005). However, visualization tools rely on methods driven
from data mining, statistics, or mathematics, etc. As a
Our research focuses on (c) and (d) categories, where each
consequence, designing an effective visualization tool is
category assists in representing, visually, a huge, massive,
not an easy process, since summarizing data involves fil-
dynamic, ambiguous data allocated in a repository of
tering out part of the data, choosing some features at the
learning objects. We noticed that there was a high overlap
expense of others, and zooming into specific aspects in the
between our work and several other related efforts, due to
data. Choosing the right parameters for filtering data is a
the fact that our research is built upon several areas of
deceiving process that involves varieties of methods.
research, spanning knowledge extraction based on the
Therefore, an efficient visualization tool should have a
hierarchical semantic representation, cluster analysis, and
flexible, interactive, dynamic interface in which users have
finally visual analysis.
the capability of changing those parameters and deciding
Recently, there has been significant of interest in using
which features to filter-out and which ones to keep.
visual analytics in variety of research fields, for example
Rusu et al. (2009a, 2009b) used visual analysis to present
2.2 Visual analytic applications
documents as a semantic directed graph, in this approach,
Delia et al. took advantage of natural language processing to
There are several visual analytic applications, each dedi-
define named entities/co-referenced entities where triplets
cated to a specific purpose. The following list is not an
(subject, predicate, object) were extracted using the Penn
exhaustive list of applications, but it provides an overview
Treebank parser for each sentence in the document and then
about the most recent areas of research where visualization
associated to WordNet, finally a summarization of the
became essential:
documents was provided using machine learning tech-
(a) Topic summarization e.g., understanding newspaper niques. Another work was introduced in Yang et al. (2008)
articles, stories, reporting events, investigating crime in which a visual analytics tool was used to present data as
reports, finding patterns in blogs, following the an interactive graph, it provides the visualization of social
development of political campaigns, or observing networks to explore communities across time, a major
topic trends in the bibliography of research interesting feature in this tool is the capability to provide
approaches (Bertini and Lalanne 2009; Choudhary relations among communities, events, or evolution of
et al. 2008; Subasic and Berendt 2008); neighborhoods. The similarity with our work lies in the
(b) Visual analysis of social networks e.g., analyzing usage of a graph to represent documents, however, the major
dynamic groups memberships in temporal social difference is that our approach is based on the semantic
networks by using graphical representations (Bourqui representation of a graph in real time and we use the visual
et al. 2009; Gloor and Zhao 2004; Kang et al. 2007; analytics tool not only to summarize the data, but also allow
Lin et al. 2008; Yang et al. 2008); the user to browse the data and retrieve documents. Dali
(c) Visual clustering analysis e.g., using data mining et al. (2009) extended their previous work in Rusu et al.
techniques to find patterns in data to generate group of (2009a, 2009b) to a question/answering based semantic
data based on (dis)similarity. Several visualization graphs, where the sentences that have been extracted from
tools have been developed in this domain and gained the documents using natural language processing techniques
great popularity, to mention some (Assent et al. 2007; were saved and used to implement a question answering
Bourennani et al. (2009); Rasmussen and Karypis system and it was used as an interface for search. Aras et al.
2008; Vadapalli and Karlapalem 2009; Zhuhadar and (2009) presents a new approach of extracting semantics
Nasraoui 2008); from popular folksonomy systems to visually explore the
(d) Semantic visual analysis e.g., visual analysis of data using hierarchical semantic representation.
webpage/documents based on the semantic represen- Our approach starts with a similar concept to the work
tation of text in a ‘‘semantic graph’’ (Collins 2006; presented in Rusu et al. (2009a, 2009b) by converting

123
L. Zhuhadar et al.

documents into a semantic directed graph, however, our relation types are defined. The Universe set (college)
approach is a web-based application, that evolves dynam- consists of all the colleges in HyperManyMedia domain.
ically in real time. In addition, we rely on the semantic As subset of the Universe, courses are defined as elements
relationship between entities more than the representation and the relationship between the Universe set (college)
of sentences in the documents. The idea is to present the and the subset course are presented as tuples of elements
hierarchical structure of concepts and subconcepts as a of the college and an individual is interpreted as an ele-
semantic graph. Also we use information retrieval tech- ment of the Universe set (college), for example Col-
niques in order to retrieve documents related to the users’ lege = English, etc.
interest, moreover, we use clustering analysis to add The domain provides some resources in multilingual
additional subconcepts to the directed graph. (English and Spanish). These resources, basically, are
courses designed by WKU faculty augmented with courses
from MIT OpenCourseWare (http://ocw.mit.edu/OcwWeb/
3 Methodology web/home/home/index.htm). HyperManyMedia consists of
the following colleges: English (Ingles), Social Work
3.1 Semantic representation of HyperManyMedia (Trabajo Social), History (Historia), Chemistry, Account-
ing, Math, Consumer and Family Sciences, Architect and
3.1.1 Formal context representation Manufacturing Sciences, Engineering (Ingenieria) and
Communication Disorders). A subset of the Universe set
The section is concerned with the representation of the (college) is defined as course set, which consists of all the
semantic model (semantic set). Kavouras and Kokla (2007) courses, under the concept course set, thelecture set is
defines a formal context, SG (Simple Conceptual Graph) as defined which consists of all the lectures in the domain
a triple (D, d, a) where D is a set of objects and d is a set of (a total of 7,264). Our entire domain D = Hypermanymedia
attributes and a defines the relationship between D and d. can be defined as Lecture set [ Course set [ College set
For example let us build a model (D, d, a) satisfying G, [ D. The second section concerns the presentation of the
which in our case represents a semantic representation of semantic set as an ontology.
the HyperManyMedia domain (Fig. 1).

Figure 2 illustrates the scenario of representing a 3.1.2 Semantic Factoring


simple conceptual graph. The main objective of this
section is to describe how we can present a semantic This section defines Semantic Factoring which is described
model as sets. The domain is constructed from concepts, by Kavouras and Kokla (2007) as follows: ‘‘Semantic
subconcepts and the relationships between them. First, a Factoring is a conceptual analysis process that decomposes
model of vocabulary is defined. This model consists of a a complex concept into its definition, primitive concepts,
set of entities in a hierarchical structure representation. called Semantic Factoring’’. Kavouras and Kokla (2007)
The highest level of this model is the college set, which emphasize the usefulness of using Semantic Factoring in
usually in graph theory represents the Universe set, under constructing and developing knowledge representation of
a Universe set (college), the concept types and the systems, especially, in the system that uses multilingual

123
Visual knowledge representation

Fig. 1 Illustrating the scenario


of representing a simple
conceptual graph

corpora. As we mentioned in the above section, our corpora 4.1 Building multilingual HyperManyMedia ontology
is bilingual (English and Spanish). Kavouras and Kokla
(2007) argue that the most important factor in building the 4.1.1 Introduction
semantics is by defining the hierarchical structures in
concepts, in addition to finding the association between The general research field of multi-language information
concepts using Concept Analysis to generate a lattice retrieval (MLIR) can be categorized into four major areas
graph, which represents the integrated ontology in introduced by Peters et al. (2003) as follows: (a) multilin-
HyperManyMedia. gual retrieval, (b) bilingual retrieval, (c) monolingual
retrieval, and (d) domain specific retrieval. According to
Oard and Dorr (1996), there are three different approaches to
build a multi-language information retrieval system: (1)
4 Implementation
Text Translation Approach, (2) Thesaurus-based Approach,
and (3) Corpus-based Approach. The approach that we fol-
The HyperManyMedia search engine is an extended version
lowed is a synergistic approach between (1) The Thesaurus-
of Nutch (http://lucene.apache.org/nutch/) search engine,
based Approach and (2) The Corpus-based Approach:
which is an open source information retrieval system. We
modified Nutch by adding plugins to support a multi-model 1. The Thesaurus-based Approach
search interface, such as metadata search (Zhuhadar 2008a, Thesaurus based text retrieval allows the learners to
b) and semantic search (Zhuhadar 2008, 2009) mechanisms. explore more information during the searching pro-
This paper is concerned with our visual search interface that cess. The information retrieval system is capable of
recently has been added to HyperManyMedia: A Visual bringing more insight about the system in a way
Ontology-based Interface. The following sections describe similar to a multilingual dictionary, but with visualized
the implementation of this interface. hints which can be considered as a powerful tool. We

123
L. Zhuhadar et al.

Fig. 2 HyperManyMedia ontology in Protégé

consider our thesaurus-based approach to be what is to another (Oard and Dorr 1996)’’. We used a query
called a ‘‘controlled vocabulary’’ approach, since the translation method to retrieve multilingual documents
semantic search is provided to the user/learner as a with expansion techniques for phrasal translation. Our
hierarchical structure. From the beginning, the search search engine uses the Vector Space Model to match
engine presents the concept of ‘‘college’’ as an upper- the query term with the indexed documents.
level concept and the right-side interface shows the This study uses Protégé (http://protege.stanford.edu/),
user the subclasses and the multilingual synonyms, an open source ontology editor and knowledge-based
assuming that the user is not aware of the semantic framework that supports two ways of modeling ontol-
concept, and with time, will understand the relation- ogies: (1) Protégé-Frames, and (2) Protégé-OWL edi-
ship between entities and he/she will be ready to tors to design and build the structure of the
formulate her own query terms. We consider this HyperManyMedia ontology. Our current ontology
approach to be a kind of query expansion. consists of *32,000 lines of code (http://www.wku.
2. Corpus-based Approach edu/*leyla.zhuhadar/semanticowl.owl).
Our approach can also be considered as Term Vector
Translation, which is defined by Oard and Dorr as
follows: ‘‘statistical multilingual text retrieval tech- 4.1.2 Multilingual ontology design
niques in which the goal is to map statistical informa-
tion about term use between languages... techniques The platform consists of vast resources of Colleges/Courses/
which map sets of tfidf term weights from one language Lectures. Table 1 shows a summary of HyperManyMedia

123
Visual knowledge representation

Table 1 Summary of HyperManyMedia Resources • Protégé reasoner (Pellet)


Total # of colleges = 11 Total # of courses = 64
Pellet is an additional component added to Protégé
which provides a web service composition to detect
Total # of WKU courses = 27 Total # of MIT courses = 37 unsatisfiable concepts and to diagnose bugs, such as (1)
Total # of English courses = 45 Total # of Spanish courses = 19 root clash, or (2) propagating errors due to dependen-
Total # of lectures = 7,264 cies between classes, etc. Refer to this site (http://
www.mindswap.org/2003/pellet/) for detailed informa-
tion regarding the development of this reasoner. The
resources. The main question is how to design an ontology architecture of Pellet is shown in Fig 3 (source:
that can summarize the whole domain? The two concepts http://www.mindswap.org/2003/pellet/architecture.png).
that have been discussed in the previous sections: Formal We used Pellet to validate and repair our ontology, most
Context Representation and Semantic Factoring were con- of the generated errors in our design were related to
sidered in the design. Figure 2 represents the HyperMany- having multilingual classes and multi-level subclasses.
Media ontology in Protégé. • Multilingual ontology specification
First, a vocabulary V is defined. This vocabulary con- In Sect. 1.4.1 we reviewed different techniques to build
sists of all concepts that are considered as part of the a multilingual information retrieval system that instead
domain. This vocabulary is defined as a hierarchical tree, of using a thesaurus, explore the statistical information
where the upper level (first-level) represents the College about the corpora. Oard and Dorr survey’s (Oard and
set and the instances represents all the colleges: English Dorr 1996) distinguishes three techniques: (1) Auto-
(Ingles), Social Work (Trabajo Social), History (Historia), matic Thesaurus Construction, (2) Term Vector Trans-
Chemistry, Accounting, Math, Consumer and Family lation and (3) Latent Semantic Indexing (LSI). Our
Sciences, Architect and Manufacturing Sciences, Engi- approach is considered as a Term Vector Translation.
neering (Ingenieria) and Communication Disorders). The Oard and Dorr (1996) define this approach as follows:
lower level (second-level) is considered as a SubConcept, ‘‘We consider statistical multilingual text retrieval
which is the Course set, that consists of all the courses. techniques in which the goal is to map statistical
Finally, the lowest level (third-level) is considered as information about term use between languages... tech-
SubSubConcept, which is the Lecture set, that consists of niques which map sets of tfidf term weights from one
all the lectures. language to another (Oard and Dorr 1996).’’ We used a
query translation method to retrieve multilingual doc-
4.1.3 Defining objects properties uments with an expansion techniques for phrasal
translation. In the following section we discuss how
Six types of objects properties were defined in Protégé to fit our information retrieval system works.
the design of the multilingual ontology, as shown in
Table 2
4.1.4 Adding the ontology to the search engine
Table 2 Defining objects properties
TheHyperManyMedia search engine uses a combination of
Object property Definition the Vector Space Model (VSM) and the Boolean Model to
Sub_class_of This property is defined to generate find the most relevant documents for a query submitted by
the hierarchical structure of the a user. The score of query q for document d is related to the
domain (Concept, SubConcept, cosine similarity between the document and query vectors
SubSubConcept)
in VSM.
has_Language This property is defined to distinguish
between English and Spanish xT  x0 xT  x0
cosðx  x0 Þ ¼ ¼ p ffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi ð1Þ
resources j x j  j x0 j xT x  x0T x0
has_College This property is defined to distinguish
a College where x 2 RjVj ; x and x0 are vector-space representations of
has_Course This property is defined to distinguish two documents, T the ’transpose’ operator and xT x0
a Course indicates the dot product between two vectors. It uses
has_Lecture This property is defined to distinguish several refinements on VSM by extending the boolean
a Lecture
vector model and adding weights associated with terms and
has_Professor This property is defined to distinguish
fields. HyperManyMedia’s scoring is influenced by the sum
a Professor
of the score for each term of a query. For each field, the

123
L. Zhuhadar et al.

Fig. 3 Pellet reasoner

Table 3 Terms used for computing the relevance of a query to a platform. The user navigates through the domain ontology
document by clicking on nodes. The complete graph represents the
Term Description E-learning ontology, and each node represents a concept or
subconcept.
coord(q,d) Score factor based on the number of query
norm(q) Normalization factor for query q Definition The Prefuse Visualization Toolkit is a Java-
tf (t in d) Term frequency of term t in the document d based toolkit for building interactive information visuali-
idf(t) Inverse document frequency of term t overall zation applications. It supports a rich set of features for
documents data modeling, visualization, and interaction. It provides
boost(t field in d) Boosting factor for specific field optimized data structures for tables, graphs, and trees, a
norm(t,d) Normalization factor for term t in document d host of layout and visual encoding techniques, and support
Score(q) Relevance of query q to document d for animation, dynamic queries, integrated search, and
database connectivity.

score is the product of the following factors: its ‘‘tf’’, ‘‘idf’’,


and index-time boosting (refer to Table 3). The score is 5 Evaluation
computed as follows,
X 5.1 Evaluation methodology
scoreðq; dÞ ¼ coordðq; dÞ  queryNormðqÞ  ðtf(tind)
 idf(t)2  t.getBoostðÞ  normðt; dÞÞ ð2Þ Section 1.3, provides a description of our methodology of
The semantic search engine in HyperManyMedia is designing and implementing a visual knowledge repre-
governed by the RDF/OWL file that contains the complete sentation of a graphical model to solve a real problem of
ontology structure of the domain. browsing and searching for lectures in a vast repository of
colleges/course. It combines Formal Concept Analysis
(FCA) with Semantic Factoring to decompose complex,
4.2 Designing a visual ontology-based search engine vast concepts into their primitives in order to develop a
knowledge representation for the HyperManyMedia plat-
This phase represents the mechanism of adding a visual form. The main objective of this section is to test the
ontology search interface to HyperManyMedia platform. usability of the visual search engine.
The user navigates through the domain ontology by
clicking on nodes. The complete graph represents the
5.2 Evaluation results
E-learning ontology, and each node represents a concept or
subconcept. We used a specific DocuBurst as part of Pre-
5.2.1 Usability test
fuse (http://prefuse.org/) libraries, which works on docu-
ments level.
The usability test consists of evaluating each concept and
subconcept presented in the visual interface. The test
4.2.1 Prefuse Visualization Toolkit covered three levels of testing: (1) based on the hierarchical
level of the ontology domain, (2) based on the English
This section represents the mechanism of adding a resources in each level, and (3) based on the Spanish
visual ontology search interface to the HyperManyMedia resources in each level (refer to Table 4 for more details).

123
Visual knowledge representation

Table 4 Usability Test for the


Test type Hierarchical English resources Spanish resources
Visual Search Engine
level (Concepts/ (Concepts/
SubConcepts) SubConcepts)
p p
Left button click College (Concept)
p p
Course (SubConcept)
p p
Lecture (SubSubConcept)
p p
Descriptive features from
(SubSubSubConcept)
p p
Right button click Course (SubConcept)
p p
Lecture (SubSubConcept)
p p
Descriptive features from
(SubSubSubConcept)
p p
Double-click Course (SubConcept)
p p
Lecture (SubSubConcept)
p p
Descriptive features from
(SubSubSubConcept)

Fig. 4 One level filtering of the query ‘‘Engineering Fig. 5 Two level filtering of the query ‘‘Engineering’’

(a) In each sector, the user can go to a deeper level of


• Functionality test granularity until reaching the leaves of that level
Testing the usability of the visual interface is related to in the graph.
the functions provided by the visual interface using the (b) If the level of filtering is higher than 1 (refer to
mouse. The following functionality is provided and Figs. 5 and 6), the user is able to see from the
each one serves a different purpose. In Table 4, we beginning an increased level of granularity equal
distinguish each one of these functionalisties and we to the level of filtering. However, by clicking on a
run the test on each level separately. specific concept, the level of granularity of that
specific concept can be extended further. The
1. Left Mouse Button Click on a Sector: process stops when it reaches the leaves in the
If the level of filtering is equal to 1, the user is able to graph.
move from concept to subconcepts (e.g., Engineer-
2. Double-Click on a Sector:
ing ? Hydrology) and all the concepts underneath the
specific concept ‘‘Engineering’’; thus, all concepts (a) In this case, the order of the visualization changes
under Engineering can be seen and retrieved visually (e.g., double clicking on Engineering will bring
(refer to Fig. 4). the Engineering to the high level of the graph and

123
L. Zhuhadar et al.

Fig. 6 Three level filtering of the query ‘‘Engineering’’

Fig. 8 Right clicking on the ‘‘Engineering’’ sector

(b) This procedure is repeated until the user reaches


the leaves of the tree under that specific concept.

Figure 8 illustrates the retrieved documents after the user


clicks with the right mouse button on the ‘‘Engineering’’
sector.

6 Conclusion
Fig. 7 Double-click on the ‘‘Engineering’’ sector
This study presents a visual information retrieval system
it will be considered as the main concept the user that uses the representation of the semantic model
would like to search underneath (refer to Fig. 7). (semantic set). It takes advantage of the formal context
(b) The user can navigate up and down through the concept to define a simple conceptual graph as triples. In
graph (ontology); the upper hierarchy level addition, it uses Protégé as a knowledge-based framework
represents an upper concept of the current node, to build triples by adding sets of objects, sets of attributes,
and the lower level represents a subconcept of the and defining the relationships between them. The semantic
current node. model satisfies the representation of the HyperManyMedia
Table 4 presents the test that we ran on each ontology. An important concept was considered in the
individual hierarchical level in the visual search design of the ontology which is the Semantic Factoring
interface. that decomposes a complex, vast concept into its primitives
to develop the knowledge representation. Also, we argued
3. Right Mouse Button Click on a Sector:
that the most important factor in building the semantic
The retrieval system considers the concept/subconcept
model is defining the hierarchical structure in concepts.
in this node as a query term and it retrieves all related
Another important factor is, discovering the association
concepts matching that query.
between concepts using Concept Analysis to generate a
(a) The graph underneath that specific node becomes lattice graph, which represents the integrated ontology in
the root of the graph and all the concepts the HyperManyMedia platform. Our approach has been
underneath this node are updated. implemented on the HyperManyMedia platform, and is

123
Visual knowledge representation

already being used by online students at WKU (http:// Lin YR, Sundaram H, Kelliher A (2008) Summarization of social
www.wku.edu). An extension of the visual ontology search activity over time: people, actions and concepts in dynamic
networks
will be considered as future work, where tag clouds will be Manning CD, Schütze H, MIT Press (1999) Foundations of statistical
added to HyperManyMedia platform. The meaning of these natural language processing. MIT Press, 1999
tags will be generated by the semantic search of the users. Oard DW, Dorr BJ (1996) A survey of multilingual text retrieval
Peters C, Braschler M, Gonzalo J (2003) Advances in cross-language
information retrieval: third workshop of the cross-language
evaluation forum, CLEF 2002, Rome, Italy, 19–20 September
2002: revised papers. Springer Verlag 2003
References Rasmussen M, Karypis G (2008) gcluto: an interactive clustering,
visualization, and analysis system. CSE/UMN Technical Report:
Aras H, Siegel S, Malaka R (2009) Semantic cloud: an enhanced TR# 04, 21, 2008
browsing interface for exploring resources in folksonomy Rusu D, Fortuna B, Grobelnik M, Mladenić D (2009) Semantic
systems graphs derived from triplets with application in document
Assent I, Krieger R, Müller E, Seidl T (2007) VISA: visual subspace summarization. Inf J
clustering analysis. ACM SIGKDD Explor Newslett 9(2):5–12 Rusu D, Fortuna B, Mladenic D, Grobelnik M, Sipos R (2009)
Bertini E, Lalanne D (2009) Surveying the complementary role of Document visualization based on semantic graphs. International
automatic data analysis and visualization in knowledge discov- conference on information visualisation, pp 292–297
ery. In: VAKD ’09: Proceedings of the ACM SIGKDD Stan J, Maret P (2009) Bridging the gap between semantic
workshop on visual analytics and knowledge discovery. ACM, technologies and social networks: semantic tagging networks
New York, NY, USA, pp 12–20 Subasic I, Berendt B (2008) Web mining for understanding stories
Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so through graph visualisation. In: Proceedings of the 2008 eighth
far. Int J Semant Web Info Syst IEEE international conference on data mining. IEEE Computer
Bourennani F, Pu KQ, Zhu Y (2009) Visual integration tool for Society, pp 570–579
heterogeneous data type by unified vectorization. In: Proceed- Szomszor M, Cattuto C, Alani H, Hara KO, Baldassarri A, Loreto V,
ings of the 10th IEEE international conference on information Servedio VDP (2007) Folksonomies, the semantic web, and
reuse & integration, Institute of Electrical and Electronics movie recommendation
Engineers Inc., pp 132–137 Thomas JJ, Cook KA (2005) Illuminating the path: the research and
Bourqui R, Gilbert F, Simonetto P, Zaidi F, Sharan U, Jourdan F development agenda for visual analytics. IEEE Computer
(2009) Detecting structural changes and command hierarchies in Society
dynamic social networks Vadapalli S, Karlapalem K (2009) Heidi matrix: nearest neighbor
Choudhary R, Mehta S, Bagchi A, Balakrishnan R (2008) Towards driven high dimensional data visualization. In: Proceedings of
characterization of actor evolution and interactions in news the ACM SIGKDD workshop on visual analytics and knowledge
corpora. Lect Notes Comput Sci 4956:422 discovery: integrating automated analysis with interactive
Collins C (2006) DocuBurst: document content visualization using exploration, ACM, pp 83–92
language structure. In: Proceedings of IEEE symposium on Yang X, Asur S, Parthasarathy S, Mehta S (2008) A visual-analytic
information visualization, poster session. Citeseer, Baltimore toolkit for dynamic interaction graphs, pp 1016–1024
Dali L, Rusu D, Fortuna B, Mladenić D, Grobelnik M (2009) Zhuhadar L, Nasraoui O (2008) Personalized cluster-based semanti-
Question answering based on semantic graphs. In: Proceedings cally enriched web search for e-learning
of semantic search at WWW2009, Madrid, Spain Zhuhadar L, Nasraoui O, Wyatt R (2009) Visual ontology-based
Gloor PA, Zhao Y (2004) Tecflow-a temporal communication flow information retrieval system. In: Proceedings of the 2009 13th
visualizer for social networks analysis. In: CSCW’04 workshop international conference on information visualisation, IEEE
on social networks. Citeseer Computer Society, pp 419–426
Heymann P, Ramage D, Garcia-Molina H (2008) Social tag Zhuhadar L, Nasraoui O (2008) Semantic information retrieval for
prediction. In: Proceedings of the 31st annual international personalized e-learning. In: 20th IEEE international conference
ACM SIGIR conference on research and development in on tools with artificial intelligence, ICTAI ’08, vol 1,
information retrieval, ACM, pp 531–538, 2008 pp 364–368, November 2008
Kang H, Getoor L, Singh L (2007) Visual analysis of dynamic group Zhuhadar L, Nasraoui O, Wyatt R (2008) A comparsion study
membership in temporal social networks. ACM SIGKDD Explor between generic and metadata search engines in an e-learning
Newslett 9(2):13–21 environment. In: IKE, pp 500–505
Kavouras M, Kokla M (2007) Theories of geographic concepts: Zhuhadar L, Nasraoui O, Wyatt R (2008) Metadata domain-knowl-
ontological approaches to semantic integration. CRC Press, Boca edge driven search engine in ‘‘hypermanymedia’’ e-learning
Raton resources. In: CSTST ’08: Proceedings of the 5th international
Kim HL, Breslin JG, Yang SK, Kim HG (2008) Social semantic cloud conference on soft computing as transdisciplinary science and
of tag: semantic model for social tagging. Lect Notes Comput technology, New York, NY, USA, ACM, pp 363–370
Sci 4953:83 Zhuhadar L, Nasraoui O, Wyatt R (2009) Dual representation of the
Kruk SR, Decker S, Zieborak L (2005) Jeromedl-adding semantic semantic user profile for personalized web search in an evolving
web technologies to digital libraries. Lect Notes Comput Sci domain. In: Proceedings of the AAAI 2009 spring symposium on
3588:716–725 social semantic web, Where Web 2.0 meets Web 3.0, pp 84–89
Kruk SR, Woroniecki T, Gzella A, Dabrowski M (2007) JeromeDL— Zipf GK (1972) Human behavior and the principle of least effort.
a semantic digital library. Semantic Web Challenge-ISWC/ Hafner, New York
ASWC, 2007

123

You might also like