Professional Documents
Culture Documents
Adam Bankó
2 RDF 9
2.1 Serialization formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Resource identication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Statement reication and context . . . . . . . . . . . . . . . . . . . . . . . 10
3 Fuzzy RDF 12
3.1 Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Blank nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Unique predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.3 Reication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Graph visualization 17
4.1 Background of Graph Drawing . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.1 Generally about tree layouts . . . . . . . . . . . . . . . . . . . . . . 18
4.2.2 Reingold and Tilford algorithm . . . . . . . . . . . . . . . . . . . . 18
4.2.3 Manual layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.4 Force directed methods . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Dealing with Large Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.1 Pan and zoom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.2 Fisheye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.4 Incremental exploration . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4.1 IsaViz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4.2 Fentwine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2
Contents
5.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.1 RDF graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 Physical simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.3 Canvas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.1 Simulation forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.2 Natural language triples . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3.3 Fuzzy reied edges . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4.2 Fuzzy visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3
1 Introduction
The word wide web (www) is an immensely complex system, and there's answer for
almost every question we can ask. Finding the answers in this system can sometimes be
very hard. In many cases we get lucky, we nd the answer on the rst page of a google
search. In other cases, the information is hard to access. The search engine knows the
answer, but we don't have the right question. Or even our search engine doesn't know
the answer, and we have to look for other search systems . . . often by searching for them.
The problems can be categorized as [34]
• general problems: the diculties rising from the size and dynamic nature of the
web;
• deep web: the content that is not indexed by standard search engines;
• lack of semantics.
The size and information content of the Internet is many factors larger than what tradi-
tional information search systems were designed for. Information gathering is slow due
to the large number of pages. Even for the most advanced systems re-scanning the entire
Internet can take weeks. This means we don't have an up-to-date index. On top of
that there are news sites, blogs, forums, etc. that change rapidly. This dynamic content
should also be accessible through web search. To this date these general problems are
more-or-less solved by modern search engines using intelligent, distributed and focused
crawling.
The deep web (also called invisible Web, dark Web or the hidden Web) is the content
that is not part of the surface Web, which is indexed by standard search engines. Search-
ing on the Internet today can be compared to dragging a net across the surface of the
ocean; a great deal may be caught in the net, but there is a wealth of information that
is deep and therefore missed. [45]. The public information on the deep Web was 400
to 550 times larger than the commonly dened World Wide Web in 2000. [16]. Many
information sites can be used by lling out question forms. This seems convenient for
the user, but a web search engine can't index and search that data.
Non-textual les count for a large chunk of the deep web content. Images, multimedia,
software and some document formats can't be understood and be indexed. In 2008 google
added OCR based PDF search capability [33], we are slowly exploring the deep web.
The main and yet unsolved problem of Internet search is the lack of semantics.
Both the indexed web sites and the search query is treated only as a list of words, without
any sense for the machine. This causes lingual problems as the information retrieval is
4
1 Introduction
based on the actual textual representation of the information. We can get dierent
answers if we search for lm X and movie X, but we can get the same result if we
search for bow paint (paint for the the front of my ship) or bow paint (paint for my
arrow shooting weapon). Here the problem arises from synonyms and homographs but
what if the best answer for my question is in a dierent language? Finding the right
answer here seems hopeless with only syntactic methods.
I have a dream for the Web [in which computers] become capable of analyzing
all the data on the Web the content, links, and transactions between people
and computers. A `Semantic Web', which should make this possible, has yet
to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy
and our daily lives will be handled by machines talking to machines. The
`intelligent agents' people have touted for ages will nally materialize.
The semantic web involves publishing in languages specically designed for data: Re-
source Description Framework (RDF), Web Ontology Language (OWL), and Extensible
Markup Language (XML). HTML describes documents and the links between them.
RDF, OWL, and XML, by contrast, can describe arbitrary things such as people, meet-
ings, or airplane parts. Tim Berners-Lee calls the resulting network of Linked Data the
Giant Global Graph (GGG), in contrast to the HTML-based World Wide Web. This
means the deep web and surface web gets linked together, the deep web disappears as
it's content gets published as semantic data.
These technologies are combined in order to provide descriptions that supplement or
replace the content of Web documents. Thus, content may manifest as descriptive data
stored in Web-accessible databases, or as markup within documents (particularly, in Ex-
tensible HTML (XHTML) interspersed with XML, or, more often, purely in XML). The
machine-readable descriptions enable content managers to add meaning to the content,
i.e. to describe the structure of the knowledge we have about that content. In this way,
a machine can process knowledge itself, instead of text, using processes similar to hu-
man deductive reasoning and inference, thereby obtaining more meaningful results and
helping computers to perform automated information gathering and research. [9]
5
1 Introduction
1.2 Uncertainty
1.2.1 Uncertainty types
To better understand uncertainty let's look at the classications published by the W3C
Uncertainty Reasoning for the World Wide Web Incubator Group (URW3-XG) [11].
Uncertainty Nature
This captures the information about the nature of the uncertainty, i.e., whether the
uncertainty is inherent in the phenomenon expressed by the sentence, or it is the result
of lack of knowledge of the agent.
Aleatory the uncertainty comes from the world; uncertainty is an inherent property of
the world.
Uncertainty Derivation
Objective derived in a formal way, repeatable derivation process.
Uncertainty Type
Ambiguity The referents of terms in a sentence about the world are not clearly specied
and therefore it cannot be determined whether the sentence is satised, see also
http://en.wikipedia.org/wiki/Ambiguity.
Empirical a sentence about a world (an event) is either satised or not satised in each
world, but it is not known in which worlds it is satised; this can be resolved by
obtaining additional information (e.g. an experiment).
Randomness sentence is an instance of a class for which there is a statistical law gov-
erning whether instances are satised.
Vagueness there is not a precise correspondence between terms in the sentence and
referents in the world, see also http://en.wikipedia.org/wiki/Vagueness.
Inconsistency there is no world that would satisfy the statement.
6
1 Introduction
UncertaintyModel
This class contains information on the mathematical theories for the uncertainty types.
The specic types of theories include, but are not limited to, the following:
• Probability
• Fuzzy Sets
• Belief Functions
• Random Sets
• Rough Sets
1.3 Visualization
RDF isn't meant to be read by humans, but by computers. However it's important that
humans can also read and understand it. People who publish or read semantic data need
tools that help them see and understand it. This is where data visualization helps us.
The serialized forms are mostly sequential, but the knowledge is inherently networked,
7
1 Introduction
thus there is a need for random data access. Computers can load millions of statements
to their RAM to eciently navigate and reason on the knowledge network. Most humans
can't do this, but we can eciently navigate on a multidimensional space (2D, 3D) where
we can easily control what information we acquire and what we discard simply by looking
at dierent directions.
A great example of the power of visualization is the Mind Mapping technique. A mind
map is a diagram used to represent words, ideas, tasks, or other items linked to and
arranged radially around a central key word or idea. Mind maps are used to generate,
visualize, structure, and classify ideas, and as an aid in study, organization, problem
solving, decision making, and writing.
The elements of a given mind map are arranged intuitively according to the importance
of the concepts, and are classied into groupings, branches, or areas, with the goal of
representing semantic or other connections between portions of information. Mind maps
may also aid recall of existing memories.
Mind mapping has proven to have a benecial eect on learning. [24] The visualization
of semantic graph are somewhat similar to mind maps, both are non-linear, graph based.
If we could visualize the semantic network as eciently as the mind maps work, that
would mean a whole new paradigm in knowledge sharing, collaboration and learning.
For example coursebooks will be replaced by interactive semantic graphs, where students
won't be forced to follow the author's train of thought, they will be empowered to discover
knowledge their own way.
8
2 RDF
The Resource Description Framework (RDF) is a family of World Wide Web Consortium
(W3C) specications originally designed as a metadata data model. It has come to be
used as a general method for conceptual description or modeling, of information that is
implemented in web resources; using a variety of syntax formats.
Basically speaking, the RDF data model [8] is not dierent from classic conceptual
modeling approaches such as Entity-Relationship or Class diagrams, as it is based upon
the idea of making statements about resources, in particular, Web resources, in the
form of subject-predicate-object expressions. These expressions are known as triples in
RDF terminology. The subject denotes the resource, and the predicate denotes traits or
aspects of the resource and expresses a relationship between the subject and the object.
For example, one way to represent the notion "The sky has the color blue" in RDF is
as the triple: a subject denoting "the sky", a predicate denoting "has the color", and
an object denoting "blue". RDF is an abstract model with several serialization formats
(i.e., le formats), and so the particular way in which a resource or triple is encoded
varies from format to format.
This mechanism for describing resources is a major component in what is proposed
by the W3C's Semantic Web activity: an evolutionary stage of the World Wide Web in
which automated software can store, exchange, and use machine-readable information
distributed throughout the Web, in turn enabling users to deal with the information
with greater eciency and certainty. RDF's simple data model and ability to model
disparate, abstract concepts has also led to its increasing use in knowledge management
applications unrelated to Semantic Web activity.
A collection of RDF statements intrinsically represents a labeled, directed multi-graph.
As such, an RDF-based data model is more naturally suited to certain kinds of knowledge
representation than the relational model and other ontological models traditionally used
in computing today. However, in practice, RDF data is often persisted in relational
database or native representations also called Triple stores, or Quad stores if context
(i.e. the named graph) is also persisted for each RDF triple. As RDFS and OWL
demonstrate, additional ontology languages can be built upon RDF. [7]
9
2 RDF
10
2 RDF
edge that it is sometimes useful to group statements according to dierent criteria, called
situations, contexts, or scopes, as discussed in articles by RDF specication co-editor Gra-
ham Klyne [32]. For example, a statement can be associated with a context, named by a
URI, in order to assert an "is true in" relationship. As another example, it is sometimes
convenient to group statements by their source, which can be identied by a URI, such
as the URI of a particular RDF/XML document. Then, when updates are made to the
source, corresponding statements can be changed in the model, as well.
In rst-order logic, as facilitated by RDF, the only metalevel relation is negation, but
the ability to generally state propositions about nested contexts allows RDF to comprise
a metalanguage that can be used to dene modal and higher-order logic.
11
3 Fuzzy RDF
Recently, fuzzy extensions to description logics have gained considerable attention espe-
cially for the purposes of handling vague information in many applications.
On the semantic web, the need to represent uncertanty in fuzzy form arises at multiple
levels. First, there is data from every day's life. There are many examples of vague
classications: old people (by age), heavy commodities (by weight), and so on. On the
web context another example is the characterization of multimedia pieces: classication
by genre, valuation of similarity among them.
The second level is more connected to the very own nature of the semantic web. The
practical management of semantic knowledge bases needs itself classications that are
fuzzy by nature, such as:
Terms such as trustworth are fuzzy. It means that they cannot be sharply dened.
However, as humans, we make sense out of this information, and use it in decision
making. [39]
Interestingly fuzzy does not seem to me to be incompatible with the Semantic Web.
Terms in rdf can be pretty fuzzy. Just take the foaf:knows relation, dened as
...
That's a denition that has quite fuzzy borders. Now one could dene a relation to
express this fuzziness, call it a fuzzy link, and that could be used like this
12
3 Fuzzy RDF
3.1 Serialization
In order to use the existing RDF storing systems to store fuzzy knowledge without
enforcing any extensions we have to provide a way of serializing fuzzy knowledge into
RDF triples. To this day three dierent approach dominates.
where :paulmembPaul is a blank node used to represent the fuzzy assertion of paul with
the concept Tall.
Mapping fuzzy role assertions can't be done with this syntax since RDF does not allow
the use of blank nodes in the predicate position. Thus, we have to use new properties
for each assertion. This is the unique predicates technique.
13
3 Fuzzy RDF
The authors here used the ineqType predicate, to be able to describe both inequalities
and equalities. While this is not used in all serialization formats, it could be easily used
in or left out of any presented here.
While this method has some semantic benets, it generates a lot of predicates used
only in one assertion and mixes the concepts of predicate and relation instance. The
concept of predicates that are both a relation type and a concrete relation between two
nodes is blurred and could mean diculties in understanding and using the system built
using this format.
14
3 Fuzzy RDF
3.1.3 Reication
In [35] the authors use RDF reication, in order to store membership degrees. The
resource representing the reication of the triple is related to the fuzzy value. A blank
node is connected to the subject, the object and the predicate, with the type Statement,
the universal RDF reication scheme. Then this node can be connected to a oat number
using some pre-dened predicate meaning the level of fuzziness.
The introductory example (the fuzziness of foaf:knows) has a lot in common with
reication. The guys at Berkley use fuzz:link, fuzz:level, fuzz:rel, fuzz:to. This fuzz:link
is similar to rdf:subject in meaning, the only dierence is the relation direction. The
rdf:object and fuzz:to, the fuzz:rel and rdf:predicate are virtually the same.
Using reication we can store any fuzzy relation (as opposed to the Blank Node tech-
nique), and we also don't need to generate one-time predicates that are both a relation
type and a concrete relation. Reication has been dened in the RDF recommendation,
and is the a standard way to make statements about a relation. I'll use this as the fuzzy
rdf serialization format for my future work.
15
3 Fuzzy RDF
16
4 Graph visualization
In this part I'll present some graph drawing methods that are interesting down the road
to the fuzzy semantic web visualization. This part is not about graph drawing in general.
For that excellent bibliographic surveys [13, 22], books [14], or even on-line tutorials [19]
exist.
(GD 'XX conferences), which were initiated in 1992 in Rome. The conference proceedings
contains new layout algorithms, theoretical results on their eciency or limitations, and
systems demonstrations.
The basic graph drawing problem can be put simply: Given a set of nodes with a set of
edges (relations), calculate the position of the nodes and the curve to be drawn for each
edge. Of course, this problem has always existed for the simple reason that a graph is
often dened by its drawing. Even Euler relied on a drawing to solve the Konigsberger
Bruckenproblem in his 1736 paper. [27]
4.2 Algorithms
A layout algorithm is used to arrange the nodes and the edges of the graph. It can
produce a vector or raster image that can be displayed.
The list of algorithms presented here is not complete. I've selected some edifying and
useful ones that seems interesting by the fuzzy semantic network visualization perspec-
tive.
The usefulness of an algorithm is a combination of many factors. The most important
ones are scalability, aesthetics and predictability.
If the algorithm works on relatively large graphs than it can be used on larger parts of
the semantic web providing a broader view to the user. This broader view enable users
a better understanding of the large scale arrangement of information.
The aesthetic criteria for an algorithms are vaguely dened, I'd say that in our eld of
research a graph layout is aesthetic, if it helps the user to learn the information in the
graph to its greatest extent.
Predictability means that two dierent runs of the algorithms involving the same or
similar graphs should not lead to radically dierent visual representations. Predictability
1
http://www.graphdrawing.org/
17
4 Graph visualization
• le systems
• 3d scene graphs
• biological taxonomy
• etc.
Even in physics, the standard model of reasoning on the nature of the physical world
decomposes large bodies down to their smallest particle components.
It's important to see how tree graph drawing relates to RDF data sets. For a particular
domain there is often possible to select a meaningful spawning tree from the rdf graph.
This tree can be used for many things later, for example to lay out the nodes of the
graph in a nice hierarchical way.
Visualized trees are so common in the computer world we don't even give it a thought,
we just use it. When did you last time use a lesystem browser with a hierarchical
display? These hierarchical visualizations became ubiquitous and so there's much to
learn from them to create solutions for everyday life. It's mostly useless if the results of
all the experiments with graph drawing and semantic web visualization are just exotic
pieces, real change will only come if the tools we make can be used without even noticing
the technology behind it.
18
4 Graph visualization
The large and uneven space between sibling nodes can be a problem on large graphs.
If we use zooming and panning (more about it later) to navigate such a graph, we might
need to zoom in and out at the same time. We need to zoom in to be able to see individual
nodes in crowded ares and zoom out so we can see more sibling nodes where there is a
huge space between them.
This method is good for a quick overview of a (maybe huge) tree, but it's problematic
when we need to see and interact with individual nodes of the graph.
more-or less pretty layout. This is all good to this point. Even when I rearrange the
result a bit to better t my needs it works perfectly well. However it gets really messy
when I start to add some nodes to a part of my graph (adding the details) and try to
2
There's usually 2-4 auto layout methods, this applies to all of them.
19
4 Graph visualization
arrange those. I'd like to see applications that allow me to auto-arrange only a parts of
my graph, but to this time I haven't encountered even one. The problem is that after
doing an auto-layout I always do some tweaking of it and this tweaking is undone by the
next auto-layout what I need because I don't want a lay out the 20 new nodes entirely
by hand. So I need to tweak the layout again to be able to use it, for example move some
related nodes close to each other so I can nd them faster. This auto-layout and tweak
cycle is not only a waste of time but is also annoying. Sometimes it's slightly better than
the all-manual layout, but it's still a great annoyance.
Most automatic layout facilities take a purely combinatorial description of a graph and
produce a layout of the graph; these methods are called 'layout creation' methods. For
interactive systems, another kind of layout is needed: a facility which can adjust a layout
after a change is made by the user or by the application. Although layout adjustment is
essential in interactive systems, most existing layout algorithms are designed for layout
creation. The use of a layout creation method for layout adjustment may totally rearrange
the layout and thus destroy the user's 'mental map' of the diagram; thus a set of layout
adjustment methods, separate from layout creation methods, is needed. [37]
The manual layout attitude worked great on 5-15 element charts for presentations
etc, where the layout of the graph was as important as the meaning of the edges and
nodes. This is not true for RDF and the semantic web, here the all the information is
encoded in the nodes and edges and the layout's purpose is no more than to help the
user learn or author this information. Using manual layout graphs for the semantic web
has two mayor obstacles. Firstly RDF graphs are often auto-generated from large data
sets where the size of the data and its rapidly changing nature makes manual layout
by humans extremely expensive. Secondly the semantic web is all about connecting
knowledge. RDF graphs can be merged automatically without any problem, but the
aesthetics or important aspects of the manually layouted graphs can't be kept by an
automated merge, only a human can well merge to manually layouted graphs and again
this is very expensive and slow.
This layout is inherently interactive as the user has full control over element positions,
it's really easy to rearrange the elements. This generates the feeling of control and this
is an important factor of a good user experience. In a good RDF editor the user should
feel in control this is great example of such a control. The lack of this feeling of control
leads to frustration, stress and bad performance.
The basic idea is as follows. To embed [lay out] a graph we replace the vertices
by steel rings and replace each edge with a spring to form a mechanical system
. . . The vertices are placed in some initial layout and let go so that the
spring forces on the rings move the system to a minimal energy state. [21]
Here the graph is modelled by a physical system of rings and springs. Firstly this real-
world metaphor helps the user understand and interact with the interface. Springs and
20
4 Graph visualization
masses are common so everyone has a good mental model of them. The metaphor
connects the user interface to this pre-existent mental model so the user can interact
immediately. First time users don't need to develop a new mental model of the interface,
the learning curve doesn't start from zero, every use have a head-start and there's no
need for user manual and help system to understand the basics. Even my young sister
understood how springs work, what inertia feels and how friction aect the movement
of objects. If I'd show her an interface where are nodes and edges and I tell her those
are things and springs then she's right ready to use it. She gures out in seconds that
things can be grabbed and moved around with the mouse and bam, she's using the
interface! Good metaphors help users to use their existing experiences with the new
system. Using the things and springs metaphor for graph user interfaces reduces the
learning time greatly.
Secondly this metaphor also helps us design, understand and talk about the system.
It's easier to talk about this system than others that don't have a good metaphor. Let's
take a simple example: gravity. You know how it integrates with the system and how it
looks like when it's turned on. Hyperbolic graph layout is a great counter example. It's
surrounded by a sort of mystery as few people really understand it. [27] It's preferable
to select a graph layout technique that has a strong metaphor as that helps the developer
community improve it and communicate about it.
Force based layouts share the same metaphor and it's a broad class. We have the
freedom to choose the forces and the model elements. We are not bound to the real
world's laws, we choose what laws we implement, what we don't and what we implement
dierently than nature did. For example we don't need to stick to Hook's law , we can
3
choose our own formula for the forces exerted by the springs as did Fruchterman and
Reingold in [26].
The base algorithm is quite simple. It takes the current position of the nodes and
calculates the new ones. In the basic form there are two forces:
The springs keep the connected nodes relatively close to each other and the repulsive
forces are keep the unrelated nodes far. To keep things simple the springs follow Hook's
law, and the repulsive forces have an inverse-square fallo like gravity and Coulomb's
law. The algorithm implementing these two forces is:
1. For every edge calculate the spring force from Hook's law for each node and add
the impulse change (the force multiplied by the timestep) to the current impulse
4
of the node.
2. For every node pair calculate the inverse-square repulsive force and add their eect
to the nodes velocity
3
Hooke's law is a macroscopic approximation of the behaviour of springs.
4
Usually the node's velocity is used, so the impulse change is rst converted to speed change.
21
4 Graph visualization
Calculating the attractive forces between neighbours is Θ (E), the repulsive force calcu-
Θ V2
lation , a great weakness of this implementation. The repulsive force calculation
can be reduced to Θ (V ) with some leery space-partitioning when the distribution of
vertices is approximately uniform.
Interactivity is big strength of the force based layout techniques. As they do the
iterative calculations the user move nodes around and see the immediate feedback. The
graph is reorganizing itself to keep the mentioned goals. This method can even survive
adding and removing nodes without the possibility of a complete rearrangement of the
graph. The user can see the transition how the new nodes or edges change the layout
without any sudden or unexpected change. This makes it a good candidate for the
iterative exploration navigation method I'll present later.
In some cases, their output can even behave well with respect to edge-crossing min-
imization without any supplementary eorts [25]. Bertault has developed a force-
directed model preserving edge-crossings, turning it into a more predictable approach
[17]. For more information on the force directed methods and the recent improvements
see [20, 25, 30, 26].
22
4 Graph visualization
diculty in this case is not with the zooming operation itself, but rather with assigning
an appropriate level of detail, i.e., a sort of clustering, to subgraphs.
A well-known problem with zooming is that if one zooms on a focus, all contextual
information is lost.
5 Such a loss of context can become a considerable usability obstacle.
A set of techniques that allow the user to focus on some detail without losing the context
can alleviate this problem. The term focus+context has been used to describe these
techniques. They do not replace zoom and pan, but rather complement them. The
complexity of the underlying data might make zoom an absolute necessity. However,
focus+context techniques are a good alternative and full blown applications systems
often implement both. Graphical sheye views are popular techniques for geometrical
focus+context. Fisheye views imitate the well-known sheye lens eect by enlarging an
area of interest and showing other portions of the image with successively less detail.
We could also speak about semantic zoom, one could also refer to semantic focus+context,
meaning that, when the distortion becomes too extreme, in some sense, nodes might dis-
appear after all. Sarkar and Brown describe this technique in their paper [41], but ner
control over this facility might lead to new insights as well.
4.3.2 Fisheye
Graphical sheye views are a popular techniques for focus+context navigation. A sheye
camera lens is a very wide angle lens that magnies nearby objects while shrinking
distant objects. This section describes the method for viewing and browsing graphs
using a mapping analog of a sheye lens. [41]
This view requires a full layout and displays a distortion of that according to a focal
point dened by the user. The initial layout of the graph is called the normal view of the
graph, and its coordinates are called normal coordinates. The coordinates of the graph
5
Unless a separate window, for example, keeps the context visible, which is done by several systems.
But, this solution is not fully satisfactory either.
23
4 Graph visualization
in the sheye view are called the sheye coordinates. The viewer's point of interest is
called the focus; it is a point in the normal coordinates.
Fisheye view is a valuable tool for seeing both local details of a graph and global context
simultaneously. It's used successfully in many cases in graph drawing and outside . Being
6
able to see both details and global context at the same time is a great help to the user to
develop and maintain a mental map of the graph. The user continually sees the global
map of the graph and the current position in it, so it's really hard to get lost with all
these help.
The major limitation of the sheye view is that it has only one focal point, the user
can't view two distant parts of the graph with the same detail. For example to compare
two distant parts the user has to navigate back and forth, keeping parts of the graph in
memory, as the details for both parts can't be seen at the same time.
4.3.3 Clustering
It is often advantageous to reduce the number of visible elements to view. Limiting the
number of visual elements to display can improve clarity while it increases performance
of the layout and rendering by cutting on node count [31]. Various abstraction and
reduction techniques have been applied in order to reduce the complexity of the displayed
graph. One approach is to perform clustering.
Clustering is the process of discovering groupings in the data. Two main kinds of
clustering exists. Structure based clustering uses purely structural information, looking
only at the nodes and edges not at labels. The other method is called c ontent-based
clustering. It incorporates graph labels to the reduction algorithm. The usefulness of a
certain content-based clustering method can greatly depend on the domain, a clustering
method working well in one knowledge domain may not work at all on a dierent one.
Clustering can be used for navigation by letting the user interactively open and close
clusters. This can be especially useful on hierarchical clusters where the user can have
compact view of most of the graph and detailed view of some chosen parts. Node the
dierence with Fisheye navigation where you could only have one focal point.
A simple rdf clustering method can be based on the hierarchy of classes. The rdfs:subClassOf
property is used to state that all the instances of one class are instances of another. Used
together with rdf:instanceOf we can create a hierarchical clustering where we can have
open and closed classes. The subclasses and instances of open classes are shown normally
but subclasses of closed classes and the instances of them are hidden. We need to decide
what to do when an object is both instance of hidden and open classes, but this can be
done. Notice, that we have built a more or less tree like structure from the RDF graph
and used as basis for hierarchical clustering. This simple example can be used in rdf
graphs where nodes are distributed along classes, however for example it's completely
useless for scenarios without object classication.
6
The animating, zooming dock concept in Mac OS X is a sheye list, and it was adopted to a wide
variety of systems. [1]
24
4 Graph visualization
[27]There are cases when the size of the graph is so huge that it becomes impossible
to handle the full graph at any time; the World Wide Web is an obvious example.
Incremental exploration techniques are good candidates for such situations. The system
displays only a small portion of the full graph and other parts of the graph are displayed
as needed. The advantage of such an incremental approach is that, at any given time,
the subgraph to be shown on the screen may be limited in size, hence, the layout and
interaction times may not be critical any more. This approach to graph exploration
is still relatively new, but interesting results in the area are already available, see, for
example [18, 28, 29, 46, 40]. Incremental exploration means that the system places a
visible window on the graph, somewhat similar to what pan does. Exploration means
to move this window (also referred to as logical frames by [28]) along some trajectory (see
Figure 4.2 on page 25). Implementation of such incremental exploration has essentially
two aspects, namely:
Generating new logical frames is always under the control of the user. In some cases, the
logical frame simply contains the nodes visited so far. [28] or North [40] included a control
over throwing away some part of the logical frame to avoid saturation on the screen. As
far as the repositioning is concerned, the simplest solution is to use the same layout
algorithm for each logical frame. This is done, for example, by [28]. (Note that the latter
use a modied spring algorithm. This is one case where the relatively small graph on the
25
4 Graph visualization
screen makes the use of a force-directed method perfectly feasible in graph visualization.)
North [40] and Brandes and Wagner [18] go further by providing dynamic control over
the parameters that direct the layout algorithms. This line of visual graph management
is still quite new, and according to [27] it will gain in importance in the years to come
and that it will complement the classical navigation and exploration methods.
IsaViz
7 is a visual environment for browsing and authoring RDF models represented
as graphs hosted by the w3c. It uses a pan+zoom navigational user interface with an
optional overview window called Radar view. The overview window helps preserving
some of the context so it's easier to navigate the graph.
The display layout is done manually with all its pros and cons described in section 4.2.3.
Edges are made of bézier curves providing nice looking connections and on the other side
more things to worry about when laying out the graph. Some form of automatic layout
functionality is present with all the diculties I mentioned earlier.
IsaViz uses the Zoomable Visual Transformation Machine toolkit. It's implemented in
Java, designed to ease the task of creating complex visual editors in which large amounts
of objects have to be displayed, or which contain complex geometrical shapes that need
to be animated. It is based on the metaphor of universes that can be observed through
smart movable/zoomable cameras, and oers features such as perceptual continuity in
7
http://www.w3.org/2001/11/IsaViz/
26
4 Graph visualization
object animations and camera movements, which should make the end-user's overall
experience more pleasing. ZVTM is nice to use and has smooth transitions but has the
problems of all pan+zoom navigation methods.
All mouse interactions are achieved through single mouse clicks. Only one command
uses the left button double click,when opening a resource's URI in the web browser.
The left mouse button's action depends on what tool is selected in the palette of icons.
The right mouse button (or single mouse button + command key under Mac OS X) is
used to navigate in the Graph view, no matter which tool is currently selected. This
unusual assignment of mouse buttons is based on the fact that navigation in the graph
is a very important functionality and should be quickly available at all times.
The IsaViz environment has four main windows: toolbox, graph view, attribute editor
and denitions. Arranging and moving these four windows well on the desktop aren't
as easy as it would be with one window. Using separate toolbox and attribute editor
needs a lot of mouse movement, having context sensitive options on what to do would
be better.
4.4.2 Fentwine
Fentwine [23] is a navigational RDF browser and editor. A central node is shown in the
middle of the screen and the nodes directly connected to it are arranged on an ellipse
around it. A node is made central when the user clicks on it.
Nodes more than one step away from the central node are also shown, up to a maximum
number of steps. Nodes further away from the center fade into the background. With this
navigational approach, the user can see which nodes are connected to the central node
without following long connective lines through a graph. This incremental exploration
method also means sacricing a persistent mental map, the user can't have the big picture
in mind, as there isn't one.
The property of each connection is shown between the two connected nodes. Each
property is shown in a dierent color, computed deterministically from the property's
URI. This allows the user to recognize the type of a connection without reading the
property name.
Fentwine hides nodes' and property's URIs if a natural language label is available. For
example, this can be a person's name. A pre-dened set of properties for nding the label
of a node is provided (containing e.g. rdf:label and foaf:name). The user can extend this
set. Displaying familiar names and labels instead of URI-s really helps users focus on the
main content without getting distracted by long text with little meaning to humans.
The users see a list of the properties in the current RDF graph, which allows them to
show or hide each property separately. Properties can be grouped in categories which can
be switched on and o as a whole. For example, when browsing a FOAF network, the user
can hide all properties except foaf:knows. This reduces clutter by hiding relationships
irrelevant to the task at hand (for example, foaf:workplaceHomepage). This feature
derives from Nelson's ZigZag [38]. Hiding uninteresting connection helps the user focus
on the main content, however the simple listing interface is hard to use when there's a
lot of properties in the viewed graph.
27
5 The proposed system
5.1 Overview
I've created a system to test and demonstrate the suggested concepts. The program is
built on the Java Platform, about 3kloc of code in 25 classes.
The program visualizes xml serialized rdf graphs. The xml le is read and the rdf
graph is built in memory. The visualization draws nodes as points and edges as lines on
a surface. The program is designed only for research, the interface is really plain. To
keeps things simple no navigation is implemented, the system is designed to handle small
graphs. If it works on small and simple graphs then should we consider implementing
navigation and support for larger graphs.
The interface uses a force based interactive layout. The graph nodes are attached to
particles in a simulation where edges are represented by springs. More on force directed
methods on section4.2.4.
To make the interface more user friendly the node and property URIs are hidden if
a natural language label is available in the graph. This great feature was taken from
Fentwine (described in 4.4.2). A predened set of properties for nding the label of a
node is provided (containing rdf:label and foaf:name). The natural language literals are
hidden to reduce clutter.
The fuzzy edges are expected in the reication format described in 3.1.3. When a fuzzy
reied edge is present the crisp edge is hidden for backward compatibility. By default
the edges and nodes that describe reied triples are hidden. For demonstration purposes
there's an option to show them.
The fuzziness of a triple can be visualized two ways. Both can be turned on and
o independently. Firstly it's possible to tweak the color of a fuzzy edge based on it's
certainty. The other option is to modify the rest length of the spring behind the edge.
This usually results in a shorter distance for certain and a longer distance for uncertain
labels.
5.2 Architecture
5.2.1 RDF graph
The RDF graph module does the xml parsing and the in-memory storage of the rdf
graph. I've used the open source JRDF (Java RDF) library to do these tasks. JRDF
is a standard set of APIs and base implementations to RDF. As it's key aspect was to
ensure a high degree of modularity it will be easy to implement incremental exploration
strategies on top of the current API by implementing the standard interfaces. It is also
28
5 The proposed system
1. The ParticleSystem, which basically takes of care of everything. The built appli-
cation has a particle system behind the UI.
2. Particles, which move around in space according to the forces acting on them.
Nodes on the UI are bound to particles in the simulation.
Traer.physics has a very stable RK4 integrator so I could use lots of complicated forces
without blowing up the simulation.
1
Like Sesame [3] and Mulgara [2]
29
5 The proposed system
5.2.3 Canvas
The canvas layer of the application is built on an open source software called Processing
[5]. Processing is an open source project initiated by Casey Reas and Benjamin Fry,
both formerly of the Aesthetics and Computation Group at the MIT Media Lab. It is a
programming language and integrated development environment (IDE) built for the elec-
tronic arts and visual design communities, which aims to teach the basics of computer
programming in a visual context, and to serve as the foundation for electronic sketch-
books. One of the stated aims of Processing is to act as a tool to get non-programmers
started with programming, through the instant gratication of visual feedback. The lan-
guage builds on the graphical capabilities of the Java programming language, simplifying
features and creating a few new ones.
Processing originally has it's own IDE and a language derived from Java. However
with some minimal hacking it can be used from any java project. I've chosen it for the
canvas layer of my application because of it's high performance and ease of use.
The events on the canvas are forwarded to the User Interface and the UI components
use the canvas to display themselves.
Figure 5.2: Class diagrams showing important parts of the user interface
The center part of the system I call user interface covers 90% of the code I've written.
It acts as a glue, synchronizes the shapes displayed on the canvas with the RDF graph
and the particle simulation.
30
5 The proposed system
60 times per second the Canvas initiates a frame draw and the screen is cleared and
every UI element is drawn. The UI is organized into layers, this controls the order the
elements are draws and selected by the cursor. The four layers of the UI in bottom-up
ordering are
1. Background is the gray background of the UI that has the focus if nothing else,
4. Control bar on the top that's used to tweak dierent aspects of the visualization
The layers are built both as a mean to control z-order and to store UI elements for later
access. As you can see on the gure5.2a the UI Control bar and the background layers
are implemented as Sets, the edge and node layers are Maps so a displayed edge can be
easily found for a triplet and a displayed node from a graph node. The displayed edges
also hold reference to the subject and object UI nodes so they have quick access to the
nodes positions.
5.3 Implementation
5.3.1 Simulation forces
The particle simulation uses three kinds of forces. One for the spring dynamics of the
edges and one for a general repulsion of nodes and one to dissipate energy so the simu-
lation has stable points.
The springs connect two particles and try to keep them in a certain distance apart.
This distance is the rest length of the spring and this is a major factor in how spacious
the visualization will be.
The spring strength denes how rigidly will the spring react to user interaction. High
strength springs are like a stick, they generate large forces. Low strength springs take a
long time to return to their rest length when the user stretches them. The graph with
low spring strengths feel gummy and somewhat disgusting.
Damping denes the energy dissipation of the spring. If springs have high damping
they don't overshoot and they settle down quickly, with low damping springs oscillate
and too much of that spoils the readability of the labels.
The general repulsion of nodes keep the graph well distributed on the plane and pre-
vents nodes from oating on each other. It's implemented as an inverse-square fallo like
gravity and Coulomb's law. There's a minimal distance assigned to the repulsion force
that limits the maximum of the force so when two particles accidentally get very close
2
2
This may happen when for example the user's dragging a node quickly and there's a frame where it's
in sub-pixel proximity to another node.
31
5 The proposed system
The spaciousness of the layout aects how the user can understand the presented
information. Two main factors aect it. On the static solutions the springs are somewhat
extended and thus represent a contracting force proportional to the extension and the
force of the spring. The general repulsion of nodes is the opposite force that is if equal
in strength to the spring force the system is stable. The length of the spring when this
is happening denes the a general spaciness of the layout.
The reication method seen in the example is recommended. By adding an rdf:ID tag
to any relation it's reied and the reied node can be accessed by the given ID.
Fuzzy edges are loaded from the graph in two steps. First the graph is searched for
node that rdf:type is rdf:Statement. The nodes in the result set are the reied edges.
On the second step for every reied node a fuzzy edge is attempted to be created. If the
reication is not complete or the fuzzy certainty is missing this fails, so only real fuzzy
edges created.
To be backward compatible the crisp edges that are reied and fuzzied are hidden.
This way the fuzzy edges can be part of the graph as crisp edges that non-fuzzy rdf
software can use.
32
5 The proposed system
5.4 Evaluation
The implemented system is available at [12] under GPL 2.0 or GPL 3.0 open source
licensing. This system was used to test and evaluate the proposed layout and visualization
methods. The system was tested with two specially crafted rdf graphs. The two graph
can be found in the source of the application
3
5.4.1 Performance
The particle simulation layout was good on loose graphs, but it didn't handle nodes with
a lot of connections well. For example the cases where a lot of people was described the
layout was arranged around the Person type and this resulted in a crowded ring in the
display. As no ltering navigation is implemented in the system it's only suitable for
small graphs. The particle simulation is especially a critical point as the repulsion forces
O N2
are calculated between all node pairs and that is an algorithm in node count. I've
made some measurements to determine the system performance under dierent graph
sizes. The system created a new node every 20-50 frame and connected the new node to
one or two existing ones. The simulation results can be seen on Figure 5.3 on page 33.
Figure 5.3: Diagram showing the performance of the system in frame per second by the
node and edge count
The results are approximately linear in the node and edge count and this indicate that
the particle simulation time is minor to some other factor. The proling of the program
showed that the program spends 50% of its time in the canvas, drawing, and the physical
3
http://kenai.com/projects/frdfvis/sources/source-code-repository/show/rdf?rev=38
33
5 The proposed system
simulation is running in less than 3% of the time. The top hot spots can be seen on
Figure 5.4 on page 34.
4
We have all seen the visual illusions in this topic.
34
5 The proposed system
hubs in the graph and applying special rules for them. More research should also be done
in the parametrization of the simulation, to improve it's responsiveness and stability.
The second area is the fuzzy edge drawing. The currently implemented two methods
are only the tip of the iceberg, new methods should be implemented and tested to create
a large repertoire that later can be used as a toolbox for serious software development
and to determine the best combination of these methods for typical situations.
The third area is navigation that makes the earlier concepts work on larger graphs.
From the current perspective it seems that incremental exploration and clustering are
the two most important directions that would make the system more usable.
All things considered this work nowhere complete, it's only the rst spit moved in this
new area.
35
Bibliography
[1] Dock (Mac OS x) - wikipedia, the free encyclopedia.
http://en.wikipedia.org/wiki/Dock_(Mac_OS_X). 6
[13] Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis G. Tollis. Al-
gorithms for drawing graphs: an annotated bibliography. Comput. Geom. Theory
Appl., 4(5):235282, 1994. 4
[14] Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis G. Tollis. Graph
Drawing: Algorithms for the Visualization of Graphs. Prentice Hall PTR, 1998. 4
[16] M. K. Bergman. The deep web: Surfacing hidden value. Journal of Electronic
Publishing, 7(1):0701, 2001. 1
36
Bibliography
[17] François Bertault. A Force-Directed algorithm that preserves edge crossing proper-
ties. In Proceedings of the 7th International Symposium on Graph Drawing, pages
351358. Springer-Verlag, 1999. 4.2.4
[18] Ulrik Brandes and Dorothea Wagner. A bayesian paradigm for dynamic graph
layout. In Proceedings of the 5th International Symposium on Graph Drawing, pages
236247. Springer-Verlag, 1997. 4.3.4
[20] Ron Davidson and David Harel. Drawing graphs nicely using simulated annealing.
ACM Trans. Graph., 15(4):301331, 1996. 4.2.4
[22] Peter Eades and Kozo Sugiyama. How to draw a directed graph. J. Inf. Process.,
13(4):424437, 1990. 4
[24] P. Farrand, F. ; Hennessy Hussain, and P. Farrand. The ecacy of the mind map
study technique. Medical Education, 36(5):426431, 2002. 1.3
[25] Arne Frick, Andreas Ludwig, and Heiko Mehldau. A fast adaptive layout algorithm
for undirected graphs. In Proceedings of the DIMACS International Workshop on
Graph Drawing, pages 388403. Springer-Verlag, 1995. 4.2.4
[26] Thomas M. J. Fruchterman and Edward M. Reingold. Graph drawing by force-
directed placement. Softw. Pract. Exper., 21(11):11291164, 1991. 4.2.4, 4.2.4
[27] Ivan Herman, Guy Melançon, and M. Scott Marshall. Graph visualization and nav-
igation in information visualization: A survey. IEEE Transactions on Visualization
and Computer Graphics, 6(1):2443, 2000. 4.1, 4.1, 4.2.4, 4.3, 4.2, 4.3.4
[28] M. L. Huang, P. Eades, and J. Wang. Online animated graph drawing using a
modied spring algorithm. Journal of Visual languages and Computing, 9(6), 1998.
4.3.4
[29] Mao Lin Huang, Peter Eades, and Robert F. Cohen. WebOFDAV navigating and
visualizing the web on-line with animated context swapping. Comput. Netw. ISDN
Syst., 30(1-7):638642, 1998. 4.2, 4.3.4
[30] T. Kamada and S. Kawai. An algorithm for drawing general undirected graphs. Inf.
Process. Lett., 31(1):715, 1989. 4.2.4
37
Bibliography
[31] Doug Kimelman, Bruce Leban, Tova Roth, and Dror Zernik. Reduction of visual
complexity in dynamic graphs, pages 218225. 1995. 4.3.3
[37] Kazuo Misue, Peter Eades, Wei Lai, and Kozo Sugiyama. Layout adjustment and
the mental map. Journal of Visual Languages & Computing, 6(2):183210, June
1995. 4.2.3
[38] T. H. Nelson. A cosmology for a dierent computer universe: Data model, mecha-
nisms, virtual machine and visualization infrastructure. Journal of Digital Informa-
tion, 5(1):200407, 2004. 4.4.2
[39] H. T. Nguyen and E. Walker. A rst course in fuzzy logic. Chapman & Hall/CRC,
2006. 3
[42] N. Simou, G. Stoilos, V. Tzouvaras, G. Stamou, and S. Kollias. Storing and query-
ing fuzzy knowledge in the semantic web. In Proceedings of the 4th International
Workshop on Uncertainty Reasoning for the Se, volume 250. 3.1.1
[43] M. A. D. Storey, K. Wong, F. D. Fracchia, and H. A. Muller. On integrating
visualization techniques for eective software exploration. 4.3.1
[45] Alex Wright. Exploring a `Deep web' that google can't grasp. The New York Times,
February 2009. 1
38
Bibliography
Proceedings of the
[46] R. Zeiliger. Supporting constructive navigation of web space. In
Workshop on Personalized and Solid Navigation in Information Space, 1998. 4.3.4
39