You are on page 1of 39

Fuzzy RDF visualization

Adam Bankó

consultant: Annamária R. Várkonyi Kóczy

May 22, 2009


Contents
1 Introduction 4
1.1 Semantic web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Uncertainty types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 RDF 9
2.1 Serialization formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Resource identication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Statement reication and context . . . . . . . . . . . . . . . . . . . . . . . 10

3 Fuzzy RDF 12
3.1 Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Blank nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Unique predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.3 Reication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Graph visualization 17
4.1 Background of Graph Drawing . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.1 Generally about tree layouts . . . . . . . . . . . . . . . . . . . . . . 18
4.2.2 Reingold and Tilford algorithm . . . . . . . . . . . . . . . . . . . . 18
4.2.3 Manual layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.4 Force directed methods . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Dealing with Large Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.1 Pan and zoom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.2 Fisheye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.4 Incremental exploration . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4.1 IsaViz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4.2 Fentwine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 The proposed system 28


5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2
Contents

5.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.1 RDF graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 Physical simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.3 Canvas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.1 Simulation forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.2 Natural language triples . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3.3 Fuzzy reied edges . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4.2 Fuzzy visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3
1 Introduction
The word wide web (www) is an immensely complex system, and there's answer for
almost every question we can ask. Finding the answers in this system can sometimes be
very hard. In many cases we get lucky, we nd the answer on the rst page of a google
search. In other cases, the information is hard to access. The search engine knows the
answer, but we don't have the right question. Or even our search engine doesn't know
the answer, and we have to look for other search systems . . . often by searching for them.
The problems can be categorized as [34]

• general problems: the diculties rising from the size and dynamic nature of the
web;

• deep web: the content that is not indexed by standard search engines;

• content ignored by the search engines;

• lack of semantics.

The size and information content of the Internet is many factors larger than what tradi-
tional information search systems were designed for. Information gathering is slow due
to the large number of pages. Even for the most advanced systems re-scanning the entire
Internet can take weeks. This means we don't have an up-to-date index. On top of
that there are news sites, blogs, forums, etc. that change rapidly. This dynamic content
should also be accessible through web search. To this date these general problems are
more-or-less solved by modern search engines using intelligent, distributed and focused
crawling.
The deep web (also called invisible Web, dark Web or the hidden Web) is the content
that is not part of the surface Web, which is indexed by standard search engines. Search-
ing on the Internet today can be compared to dragging a net across the surface of the
ocean; a great deal may be caught in the net, but there is a wealth of information that
is deep and therefore missed. [45]. The public information on the deep Web was 400
to 550 times larger than the commonly dened World Wide Web in 2000. [16]. Many
information sites can be used by lling out question forms. This seems convenient for
the user, but a web search engine can't index and search that data.
Non-textual les count for a large chunk of the deep web content. Images, multimedia,
software and some document formats can't be understood and be indexed. In 2008 google
added OCR based PDF search capability [33], we are slowly exploring the deep web.
The main  and yet unsolved  problem of Internet search is the lack of semantics.
Both the indexed web sites and the search query is treated only as a list of words, without
any sense for the machine. This causes lingual problems as the information retrieval is

4
1 Introduction

based on the actual textual representation of the information. We can get dierent
answers if we search for lm X and movie X, but we can get the same result if we
search for bow paint (paint for the the front of my ship) or bow paint (paint for my
arrow shooting weapon). Here the problem arises from synonyms and homographs but
what if the best answer for my question is in a dierent language? Finding the right
answer here seems hopeless with only syntactic methods.

1.1 Semantic web


Humans are capable of using the Web to carry out tasks such as nding the Finnish
word for "monkey", reserving a library book, and searching for a low price for a DVD. A
computer however cannot accomplish the same tasks without human direction because
web pages are designed to be read by people, not machines. The semantic web is a vision
of information that is understandable by computers, so that they can perform more of
the tedious work involved in nding, sharing, and combining information on the web.
Tim Berners-Lee (credited with inventing the World Wide Web) originally expressed
the vision of the semantic web as follows:

I have a dream for the Web [in which computers] become capable of analyzing
all the data on the Web  the content, links, and transactions between people
and computers. A `Semantic Web', which should make this possible, has yet
to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy
and our daily lives will be handled by machines talking to machines. The
`intelligent agents' people have touted for ages will nally materialize.

 Tim Berners-Lee, 1999

The semantic web involves publishing in languages specically designed for data: Re-
source Description Framework (RDF), Web Ontology Language (OWL), and Extensible
Markup Language (XML). HTML describes documents and the links between them.
RDF, OWL, and XML, by contrast, can describe arbitrary things such as people, meet-
ings, or airplane parts. Tim Berners-Lee calls the resulting network of Linked Data the
Giant Global Graph (GGG), in contrast to the HTML-based World Wide Web. This
means the deep web and surface web gets linked together, the deep web disappears as
it's content gets published as semantic data.
These technologies are combined in order to provide descriptions that supplement or
replace the content of Web documents. Thus, content may manifest as descriptive data
stored in Web-accessible databases, or as markup within documents (particularly, in Ex-
tensible HTML (XHTML) interspersed with XML, or, more often, purely in XML). The
machine-readable descriptions enable content managers to add meaning to the content,
i.e. to describe the structure of the knowledge we have about that content. In this way,
a machine can process knowledge itself, instead of text, using processes similar to hu-
man deductive reasoning and inference, thereby obtaining more meaningful results and
helping computers to perform automated information gathering and research. [9]

5
1 Introduction

1.2 Uncertainty
1.2.1 Uncertainty types
To better understand uncertainty let's look at the classications published by the W3C
Uncertainty Reasoning for the World Wide Web Incubator Group (URW3-XG) [11].

Uncertainty Nature
This captures the information about the nature of the uncertainty, i.e., whether the
uncertainty is inherent in the phenomenon expressed by the sentence, or it is the result
of lack of knowledge of the agent.

Aleatory the uncertainty comes from the world; uncertainty is an inherent property of
the world.

Epistemic the uncertainty is due to the agent whose knowledge is limited.

Uncertainty Derivation
Objective derived in a formal way, repeatable derivation process.

Subjective subjective judgement, possibly a guess.

Uncertainty Type
Ambiguity The referents of terms in a sentence about the world are not clearly specied
and therefore it cannot be determined whether the sentence is satised, see also
http://en.wikipedia.org/wiki/Ambiguity.
Empirical a sentence about a world (an event) is either satised or not satised in each
world, but it is not known in which worlds it is satised; this can be resolved by
obtaining additional information (e.g. an experiment).

Randomness sentence is an instance of a class for which there is a statistical law gov-
erning whether instances are satised.

Vagueness there is not a precise correspondence between terms in the sentence and
referents in the world, see also http://en.wikipedia.org/wiki/Vagueness.
Inconsistency there is no world that would satisfy the statement.

Incompleteness information about the world is incomplete, some information is missing.

6
1 Introduction

UncertaintyModel
This class contains information on the mathematical theories for the uncertainty types.
The specic types of theories include, but are not limited to, the following:

• Probability

• Fuzzy Sets

• Belief Functions

• Random Sets

• Rough Sets

• Combination of Several Models (Hybrid), e.g., Fuzzy Sets and Probability.

1.2.2 Probability Theory


Probability theory provides a mathematically sound representation language and formal
calculus for rational degrees of belief, which gives dierent agents the freedom to have
dierent beliefs about a given hypothesis. This provides a compelling framework for
representing uncertain, imperfect knowledge that can come from diverse agents. Not
surprisingly, there are many distinct approaches using probability for the Semantic Web.

1.2.3 Fuzzy Logic


In contrast to probabilistic formalisms, which allow for representing and processing de-
grees of uncertainty about ambiguous pieces of information, fuzzy formalisms allow for
representing and processing degrees of truth about vague (or imprecise) pieces of infor-
mation. It is important to point out that vague statements are truth-functional, that is,
the degree of truth of a vague complex statement (which is constructed from elementary
vague statements via logical operators) can be calculated from the degrees of truth of
its constituents, while uncertain complex statements are generally not a function of the
degrees of uncertainty of their constituents (Dubois and Prade, 1994).
Vagueness abounds especially in multimedia information processing and retrieval. An-
other typical application domain for vagueness and thus fuzzy formalisms are natural
language interfaces to the Web. Furthermore, fuzzy formalisms have also been success-
fully applied in ontology mapping, information retrieval, and e-commerce negotiation
tasks.

1.3 Visualization
RDF isn't meant to be read by humans, but by computers. However it's important that
humans can also read and understand it. People who publish or read semantic data need
tools that help them see and understand it. This is where data visualization helps us.
The serialized forms are mostly sequential, but the knowledge is inherently networked,

7
1 Introduction

thus there is a need for random data access. Computers can load millions of statements
to their RAM to eciently navigate and reason on the knowledge network. Most humans
can't do this, but we can eciently navigate on a multidimensional space (2D, 3D) where
we can easily control what information we acquire and what we discard simply by looking
at dierent directions.
A great example of the power of visualization is the Mind Mapping technique. A mind
map is a diagram used to represent words, ideas, tasks, or other items linked to and
arranged radially around a central key word or idea. Mind maps are used to generate,
visualize, structure, and classify ideas, and as an aid in study, organization, problem
solving, decision making, and writing.
The elements of a given mind map are arranged intuitively according to the importance
of the concepts, and are classied into groupings, branches, or areas, with the goal of
representing semantic or other connections between portions of information. Mind maps
may also aid recall of existing memories.
Mind mapping has proven to have a benecial eect on learning. [24] The visualization
of semantic graph are somewhat similar to mind maps, both are non-linear, graph based.
If we could visualize the semantic network as eciently as the mind maps work, that
would mean a whole new paradigm in knowledge sharing, collaboration and learning.
For example coursebooks will be replaced by interactive semantic graphs, where students
won't be forced to follow the author's train of thought, they will be empowered to discover
knowledge their own way.

8
2 RDF
The Resource Description Framework (RDF) is a family of World Wide Web Consortium
(W3C) specications originally designed as a metadata data model. It has come to be
used as a general method for conceptual description or modeling, of information that is
implemented in web resources; using a variety of syntax formats.
Basically speaking, the RDF data model [8] is not dierent from classic conceptual
modeling approaches such as Entity-Relationship or Class diagrams, as it is based upon
the idea of making statements about resources, in particular, Web resources, in the
form of subject-predicate-object expressions. These expressions are known as triples in
RDF terminology. The subject denotes the resource, and the predicate denotes traits or
aspects of the resource and expresses a relationship between the subject and the object.
For example, one way to represent the notion "The sky has the color blue" in RDF is
as the triple: a subject denoting "the sky", a predicate denoting "has the color", and
an object denoting "blue". RDF is an abstract model with several serialization formats
(i.e., le formats), and so the particular way in which a resource or triple is encoded
varies from format to format.
This mechanism for describing resources is a major component in what is proposed
by the W3C's Semantic Web activity: an evolutionary stage of the World Wide Web in
which automated software can store, exchange, and use machine-readable information
distributed throughout the Web, in turn enabling users to deal with the information
with greater eciency and certainty. RDF's simple data model and ability to model
disparate, abstract concepts has also led to its increasing use in knowledge management
applications unrelated to Semantic Web activity.
A collection of RDF statements intrinsically represents a labeled, directed multi-graph.
As such, an RDF-based data model is more naturally suited to certain kinds of knowledge
representation than the relational model and other ontological models traditionally used
in computing today. However, in practice, RDF data is often persisted in relational
database or native representations also called Triple stores, or Quad stores if context
(i.e. the named graph) is also persisted for each RDF triple. As RDFS and OWL
demonstrate, additional ontology languages can be built upon RDF. [7]

2.1 Serialization formats


Two common serialization formats are in use.
The rst is an XML format. This format is often called simply RDF because it was
introduced among the other W3C specications dening RDF. However, it is important
to distinguish the XML format from the abstract RDF model itself. Its MIME media

9
2 RDF

type, application/rdf+xml, was registered by RFC 3870. It recommends RDF documents


to follow the new 2004 specications.
In addition to serializing RDF as XML, the W3C introduced Notation 3 (or N3) as
a non-XML serialization of RDF models designed to be easier to write by hand, and
in some cases easier to follow. Because it is based on a tabular notation, it makes the
underlying triples encoded in the documents more easily recognizable compared to the
XML serialization. N3 is closely related to the Turtle and N-Triples formats.

2.2 Resource identication


The subject of an RDF statement is a resource, possibly as named by a Uniform Resource
Identier (URI). Some resources are unnamed and are called blank nodes or anonymous
resources. They are not directly identiable. The predicate is a resource as well, repre-
senting a relationship. The object is a resource or a Unicode string literal.
In Semantic Web applications, and in relatively popular applications of RDF like RSS
and FOAF (Friend of a Friend), resources tend to be represented by URIs that inten-
tionally denote actual, accessible data on the World Wide Web. But RDF, in general, is
not limited to the description of Internet-based resources. In fact, the URI that names
a resource does not have to be dereferenceable at all. For example, a URI that begins
with "http:" and is used as the subject of an RDF statement does not necessarily have
to represent a resource that is accessible via HTTP, nor does it need to represent a tan-
gible, network-accessible resource  such a URI could represent absolutely anything (as
a fanciful example, the URI could even represent the abstract notion of world peace).
Therefore, it is necessary for producers and consumers of RDF statements to be in
agreement on the semantics of resource identiers. Such agreement is not inherent to
RDF itself, although there are some controlled vocabularies in common use, such as
Dublin Core Metadata, which is partially mapped to a URI space for use in RDF.

2.3 Statement reication and context


The body of knowledge modeled by a collection of statements may be subjected to rei-
cation, in which each statement (that is each triple subject-predicate-object altogether)
is assigned a URI and treated as a resource about which additional statements can be
made, as in "Jane says that John is the author of document X". Reication is some-
times important in order to deduce a level of condence or degree of usefulness for each
statement.
In a reied RDF database, each original statement, being a resource, itself, most likely
has at least three additional statements made about it: one to assert that its subject is
some resource, one to assert that its predicate is some resource, and one to assert that
its object is some resource or literal. More statements about the original statement may
also exist, depending on the application's needs.
Borrowing from concepts available in logic (and as illustrated in graphical notations
such as conceptual graphs and topic maps), some RDF model implementations acknowl-

10
2 RDF

edge that it is sometimes useful to group statements according to dierent criteria, called
situations, contexts, or scopes, as discussed in articles by RDF specication co-editor Gra-
ham Klyne [32]. For example, a statement can be associated with a context, named by a
URI, in order to assert an "is true in" relationship. As another example, it is sometimes
convenient to group statements by their source, which can be identied by a URI, such
as the URI of a particular RDF/XML document. Then, when updates are made to the
source, corresponding statements can be changed in the model, as well.
In rst-order logic, as facilitated by RDF, the only metalevel relation is negation, but
the ability to generally state propositions about nested contexts allows RDF to comprise
a metalanguage that can be used to dene modal and higher-order logic.

11
3 Fuzzy RDF
Recently, fuzzy extensions to description logics have gained considerable attention espe-
cially for the purposes of handling vague information in many applications.
On the semantic web, the need to represent uncertanty in fuzzy form arises at multiple
levels. First, there is data from every day's life. There are many examples of vague
classications: old people (by age), heavy commodities (by weight), and so on. On the
web context another example is the characterization of multimedia pieces: classication
by genre, valuation of similarity among them.
The second level is more connected to the very own nature of the semantic web. The
practical management of semantic knowledge bases needs itself classications that are
fuzzy by nature, such as:

• the classication of information sources as trustworthy,

• the classication of data as reliable [35]

Terms such as trustworth are fuzzy. It means that they cannot be sharply dened.
However, as humans, we make sense out of this information, and use it in decision
making. [39]
Interestingly fuzzy does not seem to me to be incompatible with the Semantic Web.
Terms in rdf can be pretty fuzzy. Just take the foaf:knows relation, dened as

A person known by this person (indicating some level of reciprocated inter-


action between the parties).

...

We take a broad view of knows, but do require some form of reciprocated


interaction (i.e. stalkers need not apply). Since social attitudes and conven-
tions on this topic vary greatly between communities, counties and cultures,
it is not appropriate for FOAF to be overly-specic here.

If someone foaf:knows a person, it would be usual for the relation to be


reciprocated. However this doesn't mean that there is any obligation for either
party to publish FOAF describing this relationship. A foaf:knows relationship
does not imply friendship, endorsement, or that a face-to-face meeting has
taken place: phone, fax, email, and smoke signals are all perfectly acceptable
ways of communicating with people you know. [44]

That's a denition that has quite fuzzy borders. Now one could dene a relation to
express this fuzziness, call it a fuzzy link, and that could be used like this

12
3 Fuzzy RDF

: henry fuzz : link [ fuzz : r e l f o a f : knows ;


fuzz : l e v e l " slightly ";
fuzz : to : zadeh ] .

Which in graph format could be drawn as (3.1)

Figure 3.1: example for the fuzziness of foaf:knows

3.1 Serialization
In order to use the existing RDF storing systems to store fuzzy knowledge without
enforcing any extensions we have to provide a way of serializing fuzzy knowledge into
RDF triples. To this day three dierent approach dominates.

3.1.1 Blank nodes


Blank nodes can be used to store fuzzy rdf relations. The authors of [42] dene three
new entities, frdf:membership, frdf:degree and frdf:ineqType as types (i.e. rdf:type) of
rdf:Property.
The syntax becomes obvious in the following example. Suppose that we want to rep-
resent the assertion h(paul : T all) ≥ ni . The RDF triples representing this information
are the following:

paul frdf : membership _ : paulmembTall .


: paulmembTall rdf : type Tall .
: paulmembTall frdf : degree " n ^^ xsd : float " .
: paulmembTall frdf : ineqType "=" .

where :paulmembPaul is a blank node used to represent the fuzzy assertion of paul with
the concept Tall.
Mapping fuzzy role assertions can't be done with this syntax since RDF does not allow
the use of blank nodes in the predicate position. Thus, we have to use new properties
for each assertion. This is the unique predicates technique.

13
3 Fuzzy RDF

Figure 3.2: Storing a fuzzy membership relation using a blank node

3.1.2 Unique predicates


We can use new properties for each assertion to be able to describe the fuzziness of each
relation independently. In this technique we generate a unique predicate to each fuzzy
relation and make statements about the predicate (the assertion type), instead of the
relation instance. It's like creating a concrete instance of a relation template by binding
a fuzzy value to it. The assertion h(paul, f rank) : F riendOf ≥ ni is mapped to

paul frdf : paulFriendOffrank frank .


frdf : paulFriendOffrank rdf : type FriendOf .
frdf : paulFriendOffrank frdf : degree "n ^^ xsd : float " .
frdf : paulFriendOffrank frdf : ineqType "=" .

The authors here used the ineqType predicate, to be able to describe both inequalities
and equalities. While this is not used in all serialization formats, it could be easily used
in or left out of any presented here.
While this method has some semantic benets, it generates a lot of predicates used
only in one assertion and mixes the concepts of predicate and relation instance. The
concept of predicates that are both a relation type and a concrete relation between two
nodes is blurred and could mean diculties in understanding and using the system built
using this format.

14
3 Fuzzy RDF

Figure 3.3: Using unique predicates to store a fuzzy relation

3.1.3 Reication
In [35] the authors use RDF reication, in order to store membership degrees. The
resource representing the reication of the triple is related to the fuzzy value. A blank
node is connected to the subject, the object and the predicate, with the type Statement,
the universal RDF reication scheme. Then this node can be connected to a oat number
using some pre-dened predicate meaning the level of fuzziness.
The introductory example (the fuzziness of foaf:knows) has a lot in common with
reication. The guys at Berkley use fuzz:link, fuzz:level, fuzz:rel, fuzz:to. This fuzz:link
is similar to rdf:subject in meaning, the only dierence is the relation direction. The
rdf:object and fuzz:to, the fuzz:rel and rdf:predicate are virtually the same.
Using reication we can store any fuzzy relation (as opposed to the Blank Node tech-
nique), and we also don't need to generate one-time predicates that are both a relation
type and a concrete relation. Reication has been dened in the RDF recommendation,
and is the a standard way to make statements about a relation. I'll use this as the fuzzy
rdf serialization format for my future work.

15
3 Fuzzy RDF

Figure 3.4: Using reication to store fuzzy value for a relation

16
4 Graph visualization
In this part I'll present some graph drawing methods that are interesting down the road
to the fuzzy semantic web visualization. This part is not about graph drawing in general.
For that excellent bibliographic surveys [13, 22], books [14], or even on-line tutorials [19]
exist.

4.1 Background of Graph Drawing


The Graph Drawing community
1 grew around the yearly Symposia on Graph Drawing

(GD 'XX conferences), which were initiated in 1992 in Rome. The conference proceedings
contains new layout algorithms, theoretical results on their eciency or limitations, and
systems demonstrations.
The basic graph drawing problem can be put simply: Given a set of nodes with a set of
edges (relations), calculate the position of the nodes and the curve to be drawn for each
edge. Of course, this problem has always existed for the simple reason that a graph is
often dened by its drawing. Even Euler relied on a drawing to solve the Konigsberger
Bruckenproblem in his 1736 paper. [27]

4.2 Algorithms
A layout algorithm is used to arrange the nodes and the edges of the graph. It can
produce a vector or raster image that can be displayed.
The list of algorithms presented here is not complete. I've selected some edifying and
useful ones that seems interesting by the fuzzy semantic network visualization perspec-
tive.
The usefulness of an algorithm is a combination of many factors. The most important
ones are scalability, aesthetics and predictability.
If the algorithm works on relatively large graphs than it can be used on larger parts of
the semantic web providing a broader view to the user. This broader view enable users
a better understanding of the large scale arrangement of information.
The aesthetic criteria for an algorithms are vaguely dened, I'd say that in our eld of
research a graph layout is aesthetic, if it helps the user to learn the information in the
graph to its greatest extent.
Predictability means that two dierent runs of the algorithms involving the same or
similar graphs should not lead to radically dierent visual representations. Predictability

1
http://www.graphdrawing.org/

17
4 Graph visualization

is an important property of layout algorithms in information visualization[40, 36] as it is


needed to preserve the user's mental model of the graph.

4.2.1 Generally about tree layouts


A tree is a connected graph without cycles. As there is exactly one path between any two
vertices it's a much easier to store, manipulate and visualize trees than general graphs.
Tree layout algorithms can be more specic thus simpler than general graph layouts as
what works with general graphs will work with trees, but not the other way around.
Most trees we encounter are rooted. It means that there is a specic root node along
with the acyclic nature. It's common to refer to a rooted trees as trees. Trees without
designated roots are called free trees.
Tree graphs come from real-world hierarchies or hierarchical thinking. We have hier-
archical

• le systems

• 3d scene graphs

• biological taxonomy

• etc.

Even in physics, the standard model of reasoning on the nature of the physical world
decomposes large bodies down to their smallest particle components.
It's important to see how tree graph drawing relates to RDF data sets. For a particular
domain there is often possible to select a meaningful spawning tree from the rdf graph.
This tree can be used for many things later, for example to lay out the nodes of the
graph in a nice hierarchical way.
Visualized trees are so common in the computer world we don't even give it a thought,
we just use it. When did you last time use a lesystem browser with a hierarchical
display? These hierarchical visualizations became ubiquitous and so there's much to
learn from them to create solutions for everyday life. It's mostly useless if the results of
all the experiments with graph drawing and semantic web visualization are just exotic
pieces, real change will only come if the tools we make can be used without even noticing
the technology behind it.

4.2.2 Reingold and Tilford algorithm


The Reingold and Tilford algorithm (see Figure 4.1 on page 19) is one of the simplest
graph drawing methods. It places the root node on the top, each level of children down
from it on a horizontal line. It uses a bottom-up spacing, so each leaf has about equal
amount of free space and the horizontal space of a parent node is often larger if there are
more descendant nodes on it's branch. This help the user instantly see the size of the
parts of the tree.

18
4 Graph visualization

Figure 4.1: A tree layout for a moderately large graph. [27]

The large and uneven space between sibling nodes can be a problem on large graphs.
If we use zooming and panning (more about it later) to navigate such a graph, we might
need to zoom in and out at the same time. We need to zoom in to be able to see individual
nodes in crowded ares and zoom out so we can see more sibling nodes where there is a
huge space between them.
This method is good for a quick overview of a (maybe huge) tree, but it's problematic
when we need to see and interact with individual nodes of the graph.

4.2.3 Manual layout


Many existing graph editors and graphical rdf editors use manual layout. The layout
of the graph is determined of the user as it is created and the position of the nodes are
saved along with the graph itself. Here the user authors the drawn graph, not the abstract
graph. Empowering the user to manually position every node means both freedom and
burden. The user is able to nely craft the layout of the graph to every pixel and the
user is forced to take responsibility for the layout which diverts focus from primary goal,
the semantic of the graph.
Some applications oer methods to auto-layout the drawn graph. This functionality
in on all applications I've seen using it has major aws. When I rst sketch the most
important parts of the graph I want to create on the canvas it's usually a mess. When
the main components are in place I hit the auto-layout button
2 and the graph gets a

more-or less pretty layout. This is all good to this point. Even when I rearrange the
result a bit to better t my needs it works perfectly well. However it gets really messy
when I start to add some nodes to a part of my graph (adding the details) and try to

2
There's usually 2-4 auto layout methods, this applies to all of them.

19
4 Graph visualization

arrange those. I'd like to see applications that allow me to auto-arrange only a parts of
my graph, but to this time I haven't encountered even one. The problem is that after
doing an auto-layout I always do some tweaking of it and this tweaking is undone by the
next auto-layout what I need because I don't want a lay out the 20 new nodes entirely
by hand. So I need to tweak the layout again to be able to use it, for example move some
related nodes close to each other so I can nd them faster. This auto-layout and tweak
cycle is not only a waste of time but is also annoying. Sometimes it's slightly better than
the all-manual layout, but it's still a great annoyance.
Most automatic layout facilities take a purely combinatorial description of a graph and
produce a layout of the graph; these methods are called 'layout creation' methods. For
interactive systems, another kind of layout is needed: a facility which can adjust a layout
after a change is made by the user or by the application. Although layout adjustment is
essential in interactive systems, most existing layout algorithms are designed for layout
creation. The use of a layout creation method for layout adjustment may totally rearrange
the layout and thus destroy the user's 'mental map' of the diagram; thus a set of layout
adjustment methods, separate from layout creation methods, is needed. [37]
The manual layout attitude worked great on 5-15 element charts for presentations
etc, where the layout of the graph was as important as the meaning of the edges and
nodes. This is not true for RDF and the semantic web, here the all the information is
encoded in the nodes and edges and the layout's purpose is no more than to help the
user learn or author this information. Using manual layout graphs for the semantic web
has two mayor obstacles. Firstly RDF graphs are often auto-generated from large data
sets where the size of the data and its rapidly changing nature makes manual layout
by humans extremely expensive. Secondly the semantic web is all about connecting
knowledge. RDF graphs can be merged automatically without any problem, but the
aesthetics or important aspects of the manually layouted graphs can't be kept by an
automated merge, only a human can well merge to manually layouted graphs and again
this is very expensive and slow.
This layout is inherently interactive as the user has full control over element positions,
it's really easy to rearrange the elements. This generates the feeling of control and this
is an important factor of a good user experience. In a good RDF editor the user should
feel in control this is great example of such a control. The lack of this feeling of control
leads to frustration, stress and bad performance.

4.2.4 Force directed methods


I'll begin by quoting the explanation of the force metaphor from Eades.

The basic idea is as follows. To embed [lay out] a graph we replace the vertices
by steel rings and replace each edge with a spring to form a mechanical system
. . . The vertices are placed in some initial layout and let go so that the
spring forces on the rings move the system to a minimal energy state. [21]

Here the graph is modelled by a physical system of rings and springs. Firstly this real-
world metaphor helps the user understand and interact with the interface. Springs and

20
4 Graph visualization

masses are common so everyone has a good mental model of them. The metaphor
connects the user interface to this pre-existent mental model so the user can interact
immediately. First time users don't need to develop a new mental model of the interface,
the learning curve doesn't start from zero, every use have a head-start and there's no
need for user manual and help system to understand the basics. Even my young sister
understood how springs work, what inertia feels and how friction aect the movement
of objects. If I'd show her an interface where are nodes and edges and I tell her those
are things and springs then she's right ready to use it. She gures out in seconds that
things can be grabbed and moved around with the mouse and bam, she's using the
interface! Good metaphors help users to use their existing experiences with the new
system. Using the things and springs metaphor for graph user interfaces reduces the
learning time greatly.
Secondly this metaphor also helps us design, understand and talk about the system.
It's easier to talk about this system than others that don't have a good metaphor. Let's
take a simple example: gravity. You know how it integrates with the system and how it
looks like when it's turned on. Hyperbolic graph layout is a great counter example. It's
surrounded by a sort of mystery as few people really understand it. [27] It's preferable
to select a graph layout technique that has a strong metaphor as that helps the developer
community improve it and communicate about it.
Force based layouts share the same metaphor and it's a broad class. We have the
freedom to choose the forces and the model elements. We are not bound to the real
world's laws, we choose what laws we implement, what we don't and what we implement
dierently than nature did. For example we don't need to stick to Hook's law , we can
3

choose our own formula for the forces exerted by the springs as did Fruchterman and
Reingold in [26].
The base algorithm is quite simple. It takes the current position of the nodes and
calculates the new ones. In the basic form there are two forces:

• a spring is between every connected node (every edge is as a spring)

• there is a light repulsive force between every node

The springs keep the connected nodes relatively close to each other and the repulsive
forces are keep the unrelated nodes far. To keep things simple the springs follow Hook's
law, and the repulsive forces have an inverse-square fallo like gravity and Coulomb's
law. The algorithm implementing these two forces is:

1. For every edge calculate the spring force from Hook's law for each node and add
the impulse change (the force multiplied by the timestep) to the current impulse
4

of the node.

2. For every node pair calculate the inverse-square repulsive force and add their eect
to the nodes velocity

3
Hooke's law is a macroscopic approximation of the behaviour of springs.
4
Usually the node's velocity is used, so the impulse change is rst converted to speed change.

21
4 Graph visualization

3. Move the nodes by their velocity multiplied by the timestep.

Calculating the attractive forces between neighbours is Θ (E), the repulsive force calcu-
Θ V2

lation , a great weakness of this implementation. The repulsive force calculation
can be reduced to Θ (V ) with some leery space-partitioning when the distribution of
vertices is approximately uniform.
Interactivity is big strength of the force based layout techniques. As they do the
iterative calculations the user move nodes around and see the immediate feedback. The
graph is reorganizing itself to keep the mentioned goals. This method can even survive
adding and removing nodes without the possibility of a complete rearrangement of the
graph. The user can see the transition how the new nodes or edges change the layout
without any sudden or unexpected change. This makes it a good candidate for the
iterative exploration navigation method I'll present later.
In some cases, their output can even behave well with respect to edge-crossing min-
imization without any supplementary eorts [25]. Bertault has developed a force-
directed model preserving edge-crossings, turning it into a more predictable approach
[17]. For more information on the force directed methods and the recent improvements
see [20, 25, 30, 26].

4.3 Dealing with Large Graphs


The size of a graph is a big issue in semantic network visualization. A large graph can
make a normally good layout algorithm completely unusable. The networks we work with
can have thousands of nodes in only one rdf le, and the graph is nearly innite on the
web. The example Figure 4.1 on page 19 illustrates a tree with a few hundred nodes laid
out using the classical Reingold and Tilford algorithm. The high density of the layout
comes as no surprise and changing particular parameters of the algorithm will not improve
the picture for the graph. Other 2D layout techniques could be used, but most layout
algorithms suer from the same problem. Because the layout is so dense, interaction with
the graph becomes dicult. Occlusions in the picture make it impossible to navigate and
query about particular nodes. The use of 3D or of non-Euclidean geometry have also
been proposed to alleviate these problems. However, beyond a certain limit, no algorithm
will guarantee a proper layout of large graphs. There is simply not enough space on the
screen. In fact, from a cognitive perspective, it does not even make sense to display a
very large amount of data. Consequently, a rst step in the visualization process is often
to reduce the size of the graph to display. Classical layout algorithms remain usable tools
for visualization, but only when combined with these techniques. [27]
Zoom and pan are traditional tools in visualization. They are quite indispensable when
large graph structures are explored. Zoom is particularly well-suited for graphs because
the graphics used to display them are usually fairly simple (lines and simple geometric
forms). Zooming can take on two forms. Geometric zooming simply provides a blow up
of the graph content. Semantic zooming means that the information content changes and
more details are shown when approaching a particular area of the graph. The technical

22
4 Graph visualization

diculty in this case is not with the zooming operation itself, but rather with assigning
an appropriate level of detail, i.e., a sort of clustering, to subgraphs.
A well-known problem with zooming is that if one zooms on a focus, all contextual
information is lost.
5 Such a loss of context can become a considerable usability obstacle.

A set of techniques that allow the user to focus on some detail without losing the context
can alleviate this problem. The term focus+context has been used to describe these
techniques. They do not replace zoom and pan, but rather complement them. The
complexity of the underlying data might make zoom an absolute necessity. However,
focus+context techniques are a good alternative and full blown applications systems
often implement both. Graphical sheye views are popular techniques for geometrical
focus+context. Fisheye views imitate the well-known sheye lens eect by enlarging an
area of interest and showing other portions of the image with successively less detail.
We could also speak about semantic zoom, one could also refer to semantic focus+context,
meaning that, when the distortion becomes too extreme, in some sense, nodes might dis-
appear after all. Sarkar and Brown describe this technique in their paper [41], but ner
control over this facility might lead to new insights as well.

4.3.1 Pan and zoom


The most basic approach is to display the graph using a single, unied view. In a
pan+zoom view the user may pan the view using scroll bars of by dragging the mouse
over the view. [43] Also, the user may explore the graph in varying detail by zooming
in or zooming out the view. Pad++ [4] uses highly optimized graphics to achieve
smooth panning and zooming, making out-of-view parts of the graph quickly accessible.
A pan+zoom web browser was recently developed within Pad++ [15].
One problem with the pan+zoom technique is that a user can only enlarge one area
of interest at a time in a given view. To work with semantic networks we often need
to look at several distant parts of the graph at the same time. Graph layouts can't
place all connected nodes near each other and we often need to see them at once. A little
dierent problem is when we're editing a semantic network and we are connecting distant,
previously unconnected parts of the graph. Neither can be done eciently without
somehow bringing distant parts of the graph close on the screen.

4.3.2 Fisheye
Graphical sheye views are a popular techniques for focus+context navigation. A sheye
camera lens is a very wide angle lens that magnies nearby objects while shrinking
distant objects. This section describes the method for viewing and browsing graphs
using a mapping analog of a sheye lens. [41]
This view requires a full layout and displays a distortion of that according to a focal
point dened by the user. The initial layout of the graph is called the normal view of the
graph, and its coordinates are called normal coordinates. The coordinates of the graph

5
Unless a separate window, for example, keeps the context visible, which is done by several systems.
But, this solution is not fully satisfactory either.

23
4 Graph visualization

in the sheye view are called the sheye coordinates. The viewer's point of interest is
called the focus; it is a point in the normal coordinates.
Fisheye view is a valuable tool for seeing both local details of a graph and global context
simultaneously. It's used successfully in many cases in graph drawing and outside . Being
6

able to see both details and global context at the same time is a great help to the user to
develop and maintain a mental map of the graph. The user continually sees the global
map of the graph and the current position in it, so it's really hard to get lost with all
these help.
The major limitation of the sheye view is that it has only one focal point, the user
can't view two distant parts of the graph with the same detail. For example to compare
two distant parts the user has to navigate back and forth, keeping parts of the graph in
memory, as the details for both parts can't be seen at the same time.

4.3.3 Clustering
It is often advantageous to reduce the number of visible elements to view. Limiting the
number of visual elements to display can improve clarity while it increases performance
of the layout and rendering by cutting on node count [31]. Various abstraction and
reduction techniques have been applied in order to reduce the complexity of the displayed
graph. One approach is to perform clustering.
Clustering is the process of discovering groupings in the data. Two main kinds of
clustering exists. Structure based clustering uses purely structural information, looking
only at the nodes and edges not at labels. The other method is called c ontent-based
clustering. It incorporates graph labels to the reduction algorithm. The usefulness of a
certain content-based clustering method can greatly depend on the domain, a clustering
method working well in one knowledge domain may not work at all on a dierent one.
Clustering can be used for navigation by letting the user interactively open and close
clusters. This can be especially useful on hierarchical clusters where the user can have
compact view of most of the graph and detailed view of some chosen parts. Node the
dierence with Fisheye navigation where you could only have one focal point.
A simple rdf clustering method can be based on the hierarchy of classes. The rdfs:subClassOf
property is used to state that all the instances of one class are instances of another. Used
together with rdf:instanceOf we can create a hierarchical clustering where we can have
open and closed classes. The subclasses and instances of open classes are shown normally
but subclasses of closed classes and the instances of them are hidden. We need to decide
what to do when an object is both instance of hidden and open classes, but this can be
done. Notice, that we have built a more or less tree like structure from the RDF graph
and used as basis for hierarchical clustering. This simple example can be used in rdf
graphs where nodes are distributed along classes, however for example it's completely
useless for scenarios without object classication.

6
The animating, zooming dock concept in Mac OS X is a sheye list, and it was adopted to a wide
variety of systems. [1]

24
4 Graph visualization

4.3.4 Incremental exploration

Figure 4.2: Exploration of a huge graph (Adapted from [29, 27])

[27]There are cases when the size of the graph is so huge that it becomes impossible
to handle the full graph at any time; the World Wide Web is an obvious example.
Incremental exploration techniques are good candidates for such situations. The system
displays only a small portion of the full graph and other parts of the graph are displayed
as needed. The advantage of such an incremental approach is that, at any given time,
the subgraph to be shown on the screen may be limited in size, hence, the layout and
interaction times may not be critical any more. This approach to graph exploration
is still relatively new, but interesting results in the area are already available, see, for
example [18, 28, 29, 46, 40]. Incremental exploration means that the system places a
visible window on the graph, somewhat similar to what pan does. Exploration means
to move this window (also referred to as logical frames by [28]) along some trajectory (see
Figure 4.2 on page 25). Implementation of such incremental exploration has essentially
two aspects, namely:

• decide on a strategy to generate new logical frames

• reposition the content of the logical frame after each change.

Generating new logical frames is always under the control of the user. In some cases, the
logical frame simply contains the nodes visited so far. [28] or North [40] included a control
over throwing away some part of the logical frame to avoid saturation on the screen. As
far as the repositioning is concerned, the simplest solution is to use the same layout
algorithm for each logical frame. This is done, for example, by [28]. (Note that the latter
use a modied spring algorithm. This is one case where the relatively small graph on the

25
4 Graph visualization

screen makes the use of a force-directed method perfectly feasible in graph visualization.)
North [40] and Brandes and Wagner [18] go further by providing dynamic control over
the parameters that direct the layout algorithms. This line of visual graph management
is still quite new, and according to [27] it will gain in importance in the years to come
and that it will complement the classical navigation and exploration methods.

4.4 Case studies


4.4.1 IsaViz

Figure 4.3: The IsaViz graph and radar view

IsaViz
7 is a visual environment for browsing and authoring RDF models represented

as graphs hosted by the w3c. It uses a pan+zoom navigational user interface with an
optional overview window called Radar view. The overview window helps preserving
some of the context so it's easier to navigate the graph.
The display layout is done manually with all its pros and cons described in section 4.2.3.
Edges are made of bézier curves providing nice looking connections and on the other side
more things to worry about when laying out the graph. Some form of automatic layout
functionality is present with all the diculties I mentioned earlier.
IsaViz uses the Zoomable Visual Transformation Machine toolkit. It's implemented in
Java, designed to ease the task of creating complex visual editors in which large amounts
of objects have to be displayed, or which contain complex geometrical shapes that need
to be animated. It is based on the metaphor of universes that can be observed through
smart movable/zoomable cameras, and oers features such as perceptual continuity in

7
http://www.w3.org/2001/11/IsaViz/

26
4 Graph visualization

object animations and camera movements, which should make the end-user's overall
experience more pleasing. ZVTM is nice to use and has smooth transitions but has the
problems of all pan+zoom navigation methods.
All mouse interactions are achieved through single mouse clicks. Only one command
uses the left button double click,when opening a resource's URI in the web browser.
The left mouse button's action depends on what tool is selected in the palette of icons.
The right mouse button (or single mouse button + command key under Mac OS X) is
used to navigate in the Graph view, no matter which tool is currently selected. This
unusual assignment of mouse buttons is based on the fact that navigation in the graph
is a very important functionality and should be quickly available at all times.
The IsaViz environment has four main windows: toolbox, graph view, attribute editor
and denitions. Arranging and moving these four windows well on the desktop aren't
as easy as it would be with one window. Using separate toolbox and attribute editor
needs a lot of mouse movement, having context sensitive options on what to do would
be better.

4.4.2 Fentwine
Fentwine [23] is a navigational RDF browser and editor. A central node is shown in the
middle of the screen and the nodes directly connected to it are arranged on an ellipse
around it. A node is made central when the user clicks on it.
Nodes more than one step away from the central node are also shown, up to a maximum
number of steps. Nodes further away from the center fade into the background. With this
navigational approach, the user can see which nodes are connected to the central node
without following long connective lines through a graph. This incremental exploration
method also means sacricing a persistent mental map, the user can't have the big picture
in mind, as there isn't one.
The property of each connection is shown between the two connected nodes. Each
property is shown in a dierent color, computed deterministically from the property's
URI. This allows the user to recognize the type of a connection without reading the
property name.
Fentwine hides nodes' and property's URIs if a natural language label is available. For
example, this can be a person's name. A pre-dened set of properties for nding the label
of a node is provided (containing e.g. rdf:label and foaf:name). The user can extend this
set. Displaying familiar names and labels instead of URI-s really helps users focus on the
main content without getting distracted by long text with little meaning to humans.
The users see a list of the properties in the current RDF graph, which allows them to
show or hide each property separately. Properties can be grouped in categories which can
be switched on and o as a whole. For example, when browsing a FOAF network, the user
can hide all properties except foaf:knows. This reduces clutter by hiding relationships
irrelevant to the task at hand (for example, foaf:workplaceHomepage). This feature
derives from Nelson's ZigZag [38]. Hiding uninteresting connection helps the user focus
on the main content, however the simple listing interface is hard to use when there's a
lot of properties in the viewed graph.

27
5 The proposed system
5.1 Overview
I've created a system to test and demonstrate the suggested concepts. The program is
built on the Java Platform, about 3kloc of code in 25 classes.
The program visualizes xml serialized rdf graphs. The xml le is read and the rdf
graph is built in memory. The visualization draws nodes as points and edges as lines on
a surface. The program is designed only for research, the interface is really plain. To
keeps things simple no navigation is implemented, the system is designed to handle small
graphs. If it works on small and simple graphs then should we consider implementing
navigation and support for larger graphs.
The interface uses a force based interactive layout. The graph nodes are attached to
particles in a simulation where edges are represented by springs. More on force directed
methods on section4.2.4.
To make the interface more user friendly the node and property URIs are hidden if
a natural language label is available in the graph. This great feature was taken from
Fentwine (described in 4.4.2). A predened set of properties for nding the label of a
node is provided (containing rdf:label and foaf:name). The natural language literals are
hidden to reduce clutter.
The fuzzy edges are expected in the reication format described in 3.1.3. When a fuzzy
reied edge is present the crisp edge is hidden for backward compatibility. By default
the edges and nodes that describe reied triples are hidden. For demonstration purposes
there's an option to show them.
The fuzziness of a triple can be visualized two ways. Both can be turned on and
o independently. Firstly it's possible to tweak the color of a fuzzy edge based on it's
certainty. The other option is to modify the rest length of the spring behind the edge.
This usually results in a shorter distance for certain and a longer distance for uncertain
labels.

5.2 Architecture
5.2.1 RDF graph
The RDF graph module does the xml parsing and the in-memory storage of the rdf
graph. I've used the open source JRDF (Java RDF) library to do these tasks. JRDF
is a standard set of APIs and base implementations to RDF. As it's key aspect was to
ensure a high degree of modularity it will be easy to implement incremental exploration
strategies on top of the current API by implementing the standard interfaces. It is also

28
5 The proposed system

Figure 5.1: Overview of the system architecture

used by a number of triple stores


1 as API, it's quite universal and will help if the system

gets scaled up and simple in-memory rdf graph's won't be enough.


The library was easy to learn as it follows standard Java conventions and similar to
other standard Java APIs such as JDBC, XML and Collections. It provides a basic
function set for graph query, like searching objects for a given subject and object node
pair. These basic queries are executed very quickly for the in-memory database. In an
unoptimized version of the application I could do a search for natural language labels for
about 30 on-screen object at every frame of the 50 fps simulation.

5.2.2 Physical simulation


The particle simulation is done by a engine called traer.physics [10]. It's a simple physical
simulation library built mainly for processing [5] but usable from any java application.
Traer physics is really simple, has only the basics I need no extra features like collision
handling. It has 4 parts:

1. The ParticleSystem, which basically takes of care of everything. The built appli-
cation has a particle system behind the UI.

2. Particles, which move around in space according to the forces acting on them.
Nodes on the UI are bound to particles in the simulation.

3. Springs, each which acts on 2 particles. Edges are modelled as springs.

4. Attractions/repulsions, which act on 2 particles. The general node repulsion is


implemented with this force.

Traer.physics has a very stable RK4 integrator so I could use lots of complicated forces
without blowing up the simulation.

1
Like Sesame [3] and Mulgara [2]

29
5 The proposed system

5.2.3 Canvas
The canvas layer of the application is built on an open source software called Processing
[5]. Processing is an open source project initiated by Casey Reas and Benjamin Fry,
both formerly of the Aesthetics and Computation Group at the MIT Media Lab. It is a
programming language and integrated development environment (IDE) built for the elec-
tronic arts and visual design communities, which aims to teach the basics of computer
programming in a visual context, and to serve as the foundation for electronic sketch-
books. One of the stated aims of Processing is to act as a tool to get non-programmers
started with programming, through the instant gratication of visual feedback. The lan-
guage builds on the graphical capabilities of the Java programming language, simplifying
features and creating a few new ones.
Processing originally has it's own IDE and a language derived from Java. However
with some minimal hacking it can be used from any java project. I've chosen it for the
canvas layer of my application because of it's high performance and ease of use.
The events on the canvas are forwarded to the User Interface and the UI components
use the canvas to display themselves.

5.2.4 User Interface

(a) layers on the user interface

(b) user interface elements

Figure 5.2: Class diagrams showing important parts of the user interface

The center part of the system I call user interface covers 90% of the code I've written.
It acts as a glue, synchronizes the shapes displayed on the canvas with the RDF graph
and the particle simulation.

30
5 The proposed system

60 times per second the Canvas initiates a frame draw and the screen is cleared and
every UI element is drawn. The UI is organized into layers, this controls the order the
elements are draws and selected by the cursor. The four layers of the UI in bottom-up
ordering are

1. Background is the gray background of the UI that has the focus if nothing else,

2. Edges representing RDF triples,

3. Nodes represent RDF nodes,

4. Control bar on the top that's used to tweak dierent aspects of the visualization

The layers are built both as a mean to control z-order and to store UI elements for later
access. As you can see on the gure5.2a the UI Control bar and the background layers
are implemented as Sets, the edge and node layers are Maps so a displayed edge can be
easily found for a triplet and a displayed node from a graph node. The displayed edges
also hold reference to the subject and object UI nodes so they have quick access to the
nodes positions.

5.3 Implementation
5.3.1 Simulation forces
The particle simulation uses three kinds of forces. One for the spring dynamics of the
edges and one for a general repulsion of nodes and one to dissipate energy so the simu-
lation has stable points.
The springs connect two particles and try to keep them in a certain distance apart.
This distance is the rest length of the spring and this is a major factor in how spacious
the visualization will be.
The spring strength denes how rigidly will the spring react to user interaction. High
strength springs are like a stick, they generate large forces. Low strength springs take a
long time to return to their rest length when the user stretches them. The graph with
low spring strengths feel gummy and somewhat disgusting.
Damping denes the energy dissipation of the spring. If springs have high damping
they don't overshoot and they settle down quickly, with low damping springs oscillate
and too much of that spoils the readability of the labels.
The general repulsion of nodes keep the graph well distributed on the plane and pre-
vents nodes from oating on each other. It's implemented as an inverse-square fallo like
gravity and Coulomb's law. There's a minimal distance assigned to the repulsion force
that limits the maximum of the force so when two particles accidentally get very close
2

the resulting force won't blow nodes far o the screen.


The general energy dissipate in the system is called drag. It acts on all objects equally,
and the force proportional to velocity.

2
This may happen when for example the user's dragging a node quickly and there's a frame where it's
in sub-pixel proximity to another node.

31
5 The proposed system

The spaciousness of the layout aects how the user can understand the presented
information. Two main factors aect it. On the static solutions the springs are somewhat
extended and thus represent a contracting force proportional to the extension and the
force of the spring. The general repulsion of nodes is the opposite force that is if equal
in strength to the spring force the system is stable. The length of the spring when this
is happening denes the a general spaciness of the layout.

5.3.2 Natural language triples


I used two hard-coded natural language predicates. One is the foaf:name that means the
name of a person. The other one is the rdfs:label that is dened in the RDF Schema as
to provide a human-readable version of a resource's name.
For clarity the naming triples are not displayed as their data is displayed on the nodes.

5.3.3 Fuzzy reied edges


The format of the reication isn't completely dened by [35] described in section 3.1.3.
The reication part is exactly dened by [6]. The missing part is what should be the
predicate that means the fuzzy uncertainty and what data does it accept. I've dened
the new frdf (fuzzy rdf ) namespace as http://incontext.sch.bme.hu/frdf/ using the
website of a sister project and and the frdf:certainty as the predicate that is used to
dene the fuzzy certainty. For the sake of simplicity the predicate currently may only be
used on oat literals of data type http://www.w3.org/2001/XMLSchema#float.
The following example should make the usage of this predicate clear. The rdf le
this example is taken can be found at http://kenai.com/projects/frdfvis/pages/
DoctorWhoExampleRdf.
<f o a f : Person r d f : ID=" r o s e ">
< f o a f : name>Rose T y l e r </ f o a f : name>
<r e l : friendOf r d f : r e s o u r c e="#d o c t o r " r d f : ID=" r o s e F r i e n d O f D o c t o r "/>
</ f o a f : P e r s o n >
<r d f : D e s c r i p t i o n r d f : a b o u t="#r o s e F r i e n d O f D o c t o r " >
<f r d f : c e r t a n i t y r d f : d a t a t y p e="&x m l s ; f l o a t " >0.9 </ f r d f : c e r t a n i t y >
< r d f s : l a b e l >Rose is a friend of the D o c t o r </ r d f s : l a b e l >
</ r d f : D e s c r i p t i o n >

The reication method seen in the example is recommended. By adding an rdf:ID tag
to any relation it's reied and the reied node can be accessed by the given ID.
Fuzzy edges are loaded from the graph in two steps. First the graph is searched for
node that rdf:type is rdf:Statement. The nodes in the result set are the reied edges.
On the second step for every reied node a fuzzy edge is attempted to be created. If the
reication is not complete or the fuzzy certainty is missing this fails, so only real fuzzy
edges created.
To be backward compatible the crisp edges that are reied and fuzzied are hidden.
This way the fuzzy edges can be part of the graph as crisp edges that non-fuzzy rdf
software can use.

32
5 The proposed system

5.4 Evaluation
The implemented system is available at [12] under GPL 2.0 or GPL 3.0 open source
licensing. This system was used to test and evaluate the proposed layout and visualization
methods. The system was tested with two specially crafted rdf graphs. The two graph
can be found in the source of the application
3

5.4.1 Performance
The particle simulation layout was good on loose graphs, but it didn't handle nodes with
a lot of connections well. For example the cases where a lot of people was described the
layout was arranged around the Person type and this resulted in a crowded ring in the
display. As no ltering navigation is implemented in the system it's only suitable for
small graphs. The particle simulation is especially a critical point as the repulsion forces
O N2

are calculated between all node pairs and that is an algorithm in node count. I've
made some measurements to determine the system performance under dierent graph
sizes. The system created a new node every 20-50 frame and connected the new node to
one or two existing ones. The simulation results can be seen on Figure 5.3 on page 33.

Figure 5.3: Diagram showing the performance of the system in frame per second by the
node and edge count

The results are approximately linear in the node and edge count and this indicate that
the particle simulation time is minor to some other factor. The proling of the program
showed that the program spends 50% of its time in the canvas, drawing, and the physical

3
http://kenai.com/projects/frdfvis/sources/source-code-repository/show/rdf?rev=38

33
5 The proposed system

simulation is running in less than 3% of the time. The top hot spots can be seen on
Figure 5.4 on page 34.

Figure 5.4: Hot spots of the program under normal usage

5.4.2 Fuzzy visualization


The visualization of the fuzzy edges was done two separate ways in the proposed system.
One was to color the edges to show it's certainty, the other was to modify the rest length
of the spring. The coloring was done using a linear gradient from red meaning low
certainty to light blue representing high certainty.
The spring rest length modulation alone can't be used for accurate certainty visualiza-
tion for two reasons. The more obvious one is that a spring may stabilize in a position far
from it's rest length as it can be highly aected by the surroundings. The other reason is
the human length perception is not accurate in all circumstances
4 and that in a crowded

graph it can be slow.


The fuzzy edge coloring on the other hand isn't as self explanatory as the edge length
modulation. If it's used alone, a legend has to be provided so the user can associate
colors to certainty values.
The two methods used in combination however make the visualization easily under-
standable, without legends and other clutter. The length modulation alter the graph
layout so it represent both the connections in the graph and the values of the edges.
This general, global improvement of the perception combined with the local color coding
of edge certainty value provide a good way so users can understand the information in
the graph.

5.5 Conclusion and future work


The visualization is much better than if the reied edges were present in an ordinary
rdf editor but still needs a lot of improvement. There are three major areas in the eld
that need future attention. The rst is the particle simulation based layout. The current
implementation generates crowded layouts for some strongly interconnected graphs. More
research is needed to improve the layout in these situations for example by identifying the

4
We have all seen the visual illusions in this topic.

34
5 The proposed system

hubs in the graph and applying special rules for them. More research should also be done
in the parametrization of the simulation, to improve it's responsiveness and stability.
The second area is the fuzzy edge drawing. The currently implemented two methods
are only the tip of the iceberg, new methods should be implemented and tested to create
a large repertoire that later can be used as a toolbox for serious software development
and to determine the best combination of these methods for typical situations.
The third area is navigation that makes the earlier concepts work on larger graphs.
From the current perspective it seems that incremental exploration and clustering are
the two most important directions that would make the system more usable.
All things considered this work nowhere complete, it's only the rst spit moved in this
new area.

35
Bibliography
[1] Dock (Mac OS x) - wikipedia, the free encyclopedia.
http://en.wikipedia.org/wiki/Dock_(Mac_OS_X). 6

[2] Mulgara project. http://www.mulgara.org/. 1

[3] openRDF.org: home. http://www.openrdf.org/. 1

[4] Pad++: Zooming user interfaces (ZUIs). http://www.cs.umd.edu/hcil/pad++/.


4.3.1

[5] Processing. http://www.processing.org/. 5.2.2, 5.2.3

[6] RDF semantics. http://www.w3.org/TR/rdf-mt/. 5.3.3

[7] Resource description framework - wikipedia, the free encyclopedia.


http://en.wikipedia.org/wiki/Resource_Description_Framework. 2

[8] Resource description framework (RDF): concepts and abstract syntax.


http://www.w3.org/TR/rdf-concepts/. 2

[9] Semantic web. http://en.wikipedia.org/wiki/Semantic_Web. 1.1

[10] traer.physics. http://www.cs.princeton.edu/~traer/physics/. 5.2.2

[11] Uncertainty reasoning for the world wide web.


http://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331/. 1.2.1

[12] Adam Banko. Fuzzy RDF visualization  project kenai.


http://kenai.com/projects/frdfvis. 5.4

[13] Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis G. Tollis. Al-
gorithms for drawing graphs: an annotated bibliography. Comput. Geom. Theory
Appl., 4(5):235282, 1994. 4

[14] Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis G. Tollis. Graph
Drawing: Algorithms for the Visualization of Graphs. Prentice Hall PTR, 1998. 4

[15] B. B. Bederson, J. D. Hollan, J. Stewart, D. Rogers, A. Druin, D. Vick, L. Ring,


E. Grose, and C. Forsythe. A zooming web browser. Human factors in web devel-
opment, 1997. 4.3.1

[16] M. K. Bergman. The deep web: Surfacing hidden value. Journal of Electronic
Publishing, 7(1):0701, 2001. 1

36
Bibliography

[17] François Bertault. A Force-Directed algorithm that preserves edge crossing proper-
ties. In Proceedings of the 7th International Symposium on Graph Drawing, pages
351358. Springer-Verlag, 1999. 4.2.4

[18] Ulrik Brandes and Dorothea Wagner. A bayesian paradigm for dynamic graph
layout. In Proceedings of the 5th International Symposium on Graph Drawing, pages
236247. Springer-Verlag, 1997. 4.3.4

[19] Isabel F. Cruz and Roberto Tamassia. Graph drawing tutorial.


http://www.cs.brown.edu/people/rt/papers/gd-tutorial/gd-constraints.pdf. 4

[20] Ron Davidson and David Harel. Drawing graphs nicely using simulated annealing.
ACM Trans. Graph., 15(4):301331, 1996. 4.2.4

[21] P. Eades. A heuristic for graph drawing. Congressus Numerantium, 42(149160):194


202, 1984. 4.2.4

[22] Peter Eades and Kozo Sugiyama. How to draw a directed graph. J. Inf. Process.,
13(4):424437, 1990. 4

[23] Benja Fallenstein. Fentwine: A navigational RDF browser and editor.


http://www.w3.org/2001/sw/Europe/events/foaf-galway/papers/pp/fentwine/.
4.4.2

[24] P. Farrand, F. ; Hennessy Hussain, and P. Farrand. The ecacy of the mind map
study technique. Medical Education, 36(5):426431, 2002. 1.3

[25] Arne Frick, Andreas Ludwig, and Heiko Mehldau. A fast adaptive layout algorithm
for undirected graphs. In Proceedings of the DIMACS International Workshop on
Graph Drawing, pages 388403. Springer-Verlag, 1995. 4.2.4
[26] Thomas M. J. Fruchterman and Edward M. Reingold. Graph drawing by force-
directed placement. Softw. Pract. Exper., 21(11):11291164, 1991. 4.2.4, 4.2.4

[27] Ivan Herman, Guy Melançon, and M. Scott Marshall. Graph visualization and nav-
igation in information visualization: A survey. IEEE Transactions on Visualization
and Computer Graphics, 6(1):2443, 2000. 4.1, 4.1, 4.2.4, 4.3, 4.2, 4.3.4

[28] M. L. Huang, P. Eades, and J. Wang. Online animated graph drawing using a
modied spring algorithm. Journal of Visual languages and Computing, 9(6), 1998.
4.3.4

[29] Mao Lin Huang, Peter Eades, and Robert F. Cohen. WebOFDAV  navigating and
visualizing the web on-line with animated context swapping. Comput. Netw. ISDN
Syst., 30(1-7):638642, 1998. 4.2, 4.3.4

[30] T. Kamada and S. Kawai. An algorithm for drawing general undirected graphs. Inf.
Process. Lett., 31(1):715, 1989. 4.2.4

37
Bibliography

[31] Doug Kimelman, Bruce Leban, Tova Roth, and Dror Zernik. Reduction of visual
complexity in dynamic graphs, pages 218225. 1995. 4.3.3

[32] Graham Klyne. Contexts for information modelling in RDF.


http://www.ninebynine.org/RDFNotes/RDFContexts.html. 2.3

[33] Evin Levey. A picture of a thousand words?


http://googleblog.blogspot.com/2008/10/picture-of-thousand-words.html. 1

[34] Gergely Lukácsy and Péter Szeredi. Problémák a világhálóval. 1

[35] M. Mazzieri. A fuzzy RDF semantics to represent trust metadata. In Proceedings


of Semantic Web Applications and Perspectives (SWAP), 1st Italian Semantic Web
Workshop, 2004. 3, 3.1.3, 5.3.3
[36] Kazuo Misue, Peter Eades, Wei Lai, and Kozo Sugiyama. Layout adjustment and
the mental map. Journal of Visual Languages & Computing, 6(2):183210, June
1995. 4.2

[37] Kazuo Misue, Peter Eades, Wei Lai, and Kozo Sugiyama. Layout adjustment and
the mental map. Journal of Visual Languages & Computing, 6(2):183210, June
1995. 4.2.3

[38] T. H. Nelson. A cosmology for a dierent computer universe: Data model, mecha-
nisms, virtual machine and visualization infrastructure. Journal of Digital Informa-
tion, 5(1):200407, 2004. 4.4.2

[39] H. T. Nguyen and E. Walker. A rst course in fuzzy logic. Chapman & Hall/CRC,
2006. 3

[40] Stephen C. North. Incremental layout in DynaDAG. In Proceedings of the Sympo-


sium on Graph Drawing, pages 409418. Springer-Verlag, 1996. 4.2, 4.3.4
[41] Manojit Sarkar and Marc H. Brown. Graphical sheye views of graphs. In Pro-
ceedings of the SIGCHI conference on Human factors in computing systems, pages
8391, Monterey, California, United States, 1992. ACM. 4.3, 4.3.2

[42] N. Simou, G. Stoilos, V. Tzouvaras, G. Stamou, and S. Kollias. Storing and query-
ing fuzzy knowledge in the semantic web. In Proceedings of the 4th International
Workshop on Uncertainty Reasoning for the Se, volume 250. 3.1.1
[43] M. A. D. Storey, K. Wong, F. D. Fracchia, and H. A. Muller. On integrating
visualization techniques for eective software exploration. 4.3.1

[44] Henry Story. Fuzzy thinking in berkeley. http://blogs.sun.com/bblsh/entry/


fuzzy_thinking_in_berkeley. 3

[45] Alex Wright. Exploring a `Deep web' that google can't grasp. The New York Times,
February 2009. 1

38
Bibliography

Proceedings of the
[46] R. Zeiliger. Supporting constructive navigation of web space. In
Workshop on Personalized and Solid Navigation in Information Space, 1998. 4.3.4

39

You might also like