Professional Documents
Culture Documents
ID number: 93635/ETI
(Nr albumu)
Master's Thesis
Praca magisterska
Title: Social Semantic Information Sources for eLearning
(Tytuª pracy)
(Kieruj¡cy prac¡)
(Konsultatnt)
Thesis domain: Make a thorough analysis of Social Semantic Information Sources in a context
(Zakres pracy) of using them in eLearning. Identify best tting ontologies used for their de-
scription. Dene a common object model for them. Develop the framework
Gda«sk, 2007
Contents
1 Introduction 5
1.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Related work 9
2.1 eLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 AJAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Democracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.5 Mashups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1
3.1.1 Semantic Blogs . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 SIOC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2 Didaskon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5 System implementation 86
5.1 Implementation methodology . . . . . . . . . . . . . . . . . . . . . . 86
5.4.2 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2
5.5.1 Implementation of REST . . . . . . . . . . . . . . . . . . . . . 93
6 Conclusions 109
6.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3
7.6.1 Publikacje . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Bibliography 128
4
Chapter 1
Introduction
First Chapter is the introduction to this Master's Thesis. Here, I formulate and
describe the main problems I should face while developing the nal system. I form
the goals to achieve. Also, I describe the methodology of the developed system.
People have been learning for ages; to eat one had to hunt, to feel relaxed
sleep, to stay alive avoid danger. One was obliged to learn to make his/her life
It seems nothing has changed. However, the learning process is more organized
now than it used to be. In general, learning can be divided into formal and informal.
old traditional approach. Courses are rigid, made once and for all. Students are
pushed to go through a course from beginning to the end without the possibility of
taneous and gives a user more exibility in deciding when, where and what to learn.
5
We often learn that way unconsciously by chatting, video conferences, observing oth-
ers, reading blogs or wikis. It is denitely cheaper and, what can be strange, more
eective. In fact, most learning occurs as such unstructured processes and is not
75% of organizational learning is informal [36]. Informal learning relates with col-
it is naturally suited for distance, exible learning but also can be used along with the
delivered on CD-ROMs and sent across the country [31]. Nowadays, tools like web-
based teaching materials and hypermedia in general (web pages, discussion boards,
simulations and games) are commonly used. eLearning is so popular due to the
expansion of the Internet. There are a lot of online services that provide courses for
1 2 3
free or by paying, e.g. Nuvvo , Berlin High School eLearning Online , eTech Ohio .
After all, there are a considerable number of Web 2.0 (see Sec. 2.3) services
like blogs, wikis, fora, digital libraries, online chats or video conferences. They allow
users to collaborate with peers and share their opinions. They are sources of informal
By applying the Semantic Web (see Sec. 7.2.2) to Web 2.0 services, we make their
content also machine readable. Thus, computers can assist and guide users so that
they are not lost in the sea of information [61]. The assumptions of the Semantic
Web are fullled by introducing semantic annotations of online resources. This way,
blogs, wikis, and digital libraries become Social Semantic Information Sources (see
Chapter 7.3.1).
1 http://www.nuvvo.com/
2 http://moodle.berlinwall.org/
3 http://www.etech.ohio.gov
6
In this thesis, I focus on informal learning based on Social Semantic Information
Sources (SSIS). I consider semantic blogs, semantic wikis and social semantic digital
libraries as unfailing source of informal knowledge. Both Web 2.0 and the Semantic
Web, combined together, allow us to create new, better solutions that go beyond
current eLearning assumptions. They form a new learning standard, eLearning 2.0,
which aims at giving the ability to leverage the community as a part of the larger
4
Didaskon is a system designed according to eLearning 2.0 assumptions. This
5
is a project developed in the Digital Enterprise Research Institute at the National
6
University of Ireland, Galway ; it was initiated as a working group project in co-
7
operation with Gda«sk University of Technology , Faculty of Electronics, Telecom-
8
munications and Informatics . Didaskon delivers a framework for composing on-
a learning path which best ts a specic learner. To achieve that, the system uses
initial information (preconditions) like a student's needs, skills, learning history etc.,
anticipated resulting skills and knowledge (goals), and technical details of the clients
platform.
system. IKHarvester will be a Service Oriented Architecture (SOA) layer. Its goal
is to capture informal learning from Social Semantic Information Sources and store
metadata for this information. Stored data should be delivered to Didaskon in the
form of informal learning objects that support formal ones during learning path
composition.
4 http://didaskon.corrib.org/
5 http://deri.org
6 http://www.nuigalway.ie/
7 http://www.pg.gda.pl/
8 http://www.eti.pg.gda.pl/
7
Social Semantic Information Sources provide heterogeneous data. Hence, design
them. The extension should provide data in a consistent, common object model.
Thus, Didaskon is able to perform more eective reasoning during course composi-
tion.
1.3 Outline
This thesis is divided into six chapters. In Chapter 2, I present related works; there
is a summary of the designing stage; here I dene the system requirements, use
I have used during the system development, software helpful during writing the
work.
8
Chapter 2
Related work
In this Chapter, I describe the state of the art in eLearning, Web 2.0, and the Seman-
tic Web. At rst, I give a denition of eLearning and present how it has changed in
years. I dene a Learning Object that is anything one can acquire, manage and use.
Then, I characterize the Semantic Web, a newer, better Web, where the content of
resources is used by people and software agents. Afterwords, I introduce Web 2.0, a
Web dedicated for communities whose members share information and collaborate.
2.1 eLearning
eLearning (Electronic Learning) [40] is the delivery of educational content through
any electronic media, including the Internet, intranets, extranets, satellite broadcast,
audio and video tapes, interactive TV, CD-ROMs, interactive CDs and computer-
based training. It is expected to squeeze out the old-fashioned learning. In the old
approach, a student is passive, pushed to learn. He/she is obliged to obey some rules
dening when and where the classes take place and what is their actual content [33].
Unlike that, a learner should be given (to some extent) a free hand with regard
to selecting the course schedule. One should be allowed to learn just-in-time, on-
demand. Moreover, he/she should have inuence on the contents of the classes.
9
Learning should be customized, initiated by user proles and business demands.
There are two communication technologies used for eLearning: synchronous and
asynchronous. The rst expects students to gather face-to-face or use chats, video-
conferences etc. The latter approach is characterized by using blogs, wikis or dis-
cussion boards as tools for sharing opinions or gained experience. All in all, they
culators, VCRs, radio and bulletin board systems were used. All these tools have
In the mid 1980s eLearning started to develop rapidly. It was the time when
the Multimedia Era began; Windows 3.1, PowerPoint, Macintosh and CD-ROMs
became popular and common all over the world [25]. Computers were supposed
to make training more transportable and visually engaging; learning became the
In the mid 1990s, the Internet was popular enough; hence, training providers
tried to incorporate it into tuitional process. They found emails, web browsers, web
pages, media players, streamed audio and video very helpful. A great number of
companies enriched courses with graphics and web-based training and made them
In general, a Learning Object is something you can acquire, manage and use.
LOs [19] are reusable, modular, exible, portable and compatible. Ecient manag-
work properly. The problem is, how to organize metadata so that it can be ex-
10
SCORM
1
SCORM (Shareable Content Object Reference Model) is a Web oriented data model
for content aggregation. This is an XML-based framework used to dene and access
information about LOs so they can be easily shared among dierent LMSs. SCORM
focuses on the structure, runtime environment for LOs and description of learning
process [53].
The part of SCORM related to this thesis is SCORM CAM (Content Aggregation
Model) which denes how to create and manage LOs. According to SCORM CAM,
the content of a learning object can be diverse: plain text, HTML code, short movies
or even more complicated interactive course. Also, SCORM CAM includes all the
LOM
LOM denes the way to build metadata for LOs. There are nine categories of
this information [18] each of which focus on dierent aspects (see also 2.1):
• Lifecycle features related to the history and current state of the LO and
the LO
1 http://www.adlnet.gov/scorm/
11
Figure 2.1: LOM structure (from [60])
12
• Relation group of features dening the relationship between the LO and
vantage is the convenience and exibility a learner is given; one learns when he/she
ers, which allows them to share opinions on learning material. eLearning provides
tages.
prepared for all. Usually, courses are not personalized [27], they are tailored for
a generic student on one of the generic levels of skills or knowledge. The main
assumption is a desire to pass the course. Learning services usually do not take into
account specic user's conditions, like wishing to broaden knowledge in wide range
of domains at the same time [52]. Thus, a student is obliged to attend dierent
Secondly, current learning services treat students as single entities; they do not
course alone, without response from other students who also attend it. Students are
methods they serve only what is supplied by the provider. Currently, in the
Internet, there is a lot relevant information which could support learning process.
Current LMSs are not capable of understanding the content of many web pages,
13
eLearning needs management support in order to dene a vision and plan for
learning and to integrate learning into daily work. However, current Web based
solutions do not meet the requirements mentioned above; they bring the problem of
understandable. Only the course creators and students can understand the content
of the course.
Semantic Web and focus on its core assumptions and solutions. Finally. I present
the Semantic Web 2.0, which links the Semantic Web platform into existing Web
2.0 features.
of interlinked, hypertext documents (called web pages or website) that run over the
Internet. Web pages can contain text and multimedia such as images, movies, music,
etc. They are navigated by using hyperlinks and viewed with a Web browser [63].
Web pages are written in HTML language. Each page has its own URL (Uniform
hand, the information is not machine readable. Take the following sentence as
established fact, understood by a human. However, it does not bring any particulars
for a machine. One can ask: why should machines understand web page contents,
when these are people who look for it? Nothing more confusing.
14
Simple scenario
Imagine Adam, a young man with a broad music taste. He had listened rock and
ska music for ages. Once he heard a reggae song by Bob Marley on his favorite
Internet radio. He really liked the song, and wanted to learn something about
The perfect situation assumes he nds a reggae music fans community where
a great many information and useful links could be found. Even the previously
mentioned sentence about Bob Marley could, for a start, satisfy Adam as it brings
some knowledge. But, this sentence can be hidden in the mids of the accumulation
But, again, how can the computer nd anything when it does not comprehend
it? It must be somehow described, so that a machine can distinguish one piece of
of appropriate resources denition. Without it, computers are not able to help
The Semantic Web will bring structure to the meaningful content of Web
The word semantic stands for the meaning of . The Semantic Web encom-
passes eorts to build a new World Wide Web architecture that enhances content
One of the most important advantages of the Semantic Web is exibility. Dif-
ferent kinds of data can be used altogether and diverse types of analysis can be
15
applied over it [64]. For instance, a book can be described with Dublin Core [9]
2
annotations whereas information about the author can be expressed by using the
3
FOAF (Friend-of-a-Friend) vocabulary [7]. Moreover, vocabularies can be easily
Semantics entails description issues, so that artifacts are understood and eciently
4
metadata description. It is a W3C standard for describing web resources which
have been assigned a URI by which they can be identied. It was designed to be
2 http://dublincore.org/
3 http://www.foaf-project.org/
4 http://www.w3.org/
16
programs or automated scripts (crawlers) can eciently search, discover, collect and
predicate and an object; altogether they are called a triple (a statement) [58]. A
collection of RDF statements produces a directed graph in which arrows point from
• a predicate (property): is a
Supposing all three parts are attributed with URI with http://example.com
namespace, the above statement can be illustrated by a graph showed on Fig. 2.3:
Besides the graph, RDF N3 representation can be used to show triples and rela-
tionships between them. See the List. 2.1 to learn the structure of N3 representation
17
<h t t p : / / e x a m p l e . com/ p r o f e s s i o n#f o o t b a l _ p l a y e r > .
Using triples is eective and very popular. However, for the representation rea-
< r d f : RDF
<r d f : D e s c r i p t i o n
<p r o p e r t y : i s
r d f : r e s o u r c e=
</ r d f : D e s c r i p t i o n >
</ r d f : RDF>
According to W3C [59], RDF aims at represent information on the Web so that it is
to describe groups of related resources (their domain and ranges of properties) and
Ontologies
Ontology is a word with quite a handful of meaning. The term is borrowed from phi-
losophy. It refers to the science of describing entities in the world and relationships
between them.
Although RDF and RDF Schema are helpful in expressing simple statements,
they lack when used in more complex cases. That is why Web Ontology Language
(OWL) was developed. OWL is a markup language for publishing and sharing data
18
OWL DL and OWL Full. Each sub-language encapsulates the former ones. It is
is a form of knowledge representation of such domain. Below, I oresent the list of the
most popular RDF Schema metadata denitions for specic domains of interests:
5
• people and social networks: Friend Of A Friend (FOAF)
6
• online discussions: Semantically-Interlinked Online Communities (SIOC)
7
• career: Description Of A Career (DOAC)
8
• project: Description Of A Project (DOAP)
9
nization System (SKOS)
The most popular way to search on the web is text searching. It is supported by
Google, Yahoo and other search engines. One just enters the query string and then
is given a set of possible answers. The list is huge and often consists of garbage
information, though.
mantic indexing and query renement. The former makes it possible to measure
distance between terms; the latter improves imprecise query string so that more ad-
10
equate results are found [29]. Using dictionaries, like WordNet , can boost search-
5 http://www.foaf-project.org/
6 http://sioc-project.org/
7 http://ramonantonio.net/doac/
8 http://usefulinc.com/doap/
9 http://www.w3.org/2004/02/skos/
10 http://wordnet.princeton.edu/
19
words used in non-basic form. As described earlier, RDF consists of graph struc-
ture and literals. Thus, a search can be performed by using both keywords and
structured queries.
a centralized authority (teacher) who foists already dened course schedule on stu-
dents. It is impossible to satisfy all students' needs because they dier one from
another.
The Semantic Web can be successfully employed for describing LOs which rep-
resents learning material. Software agents can perform continuous scanning of se-
Additionally, agents may use a commonly agreed service language, which boosts
becoming signicantly simpler and faster. Then, it is possible to use diverse types
of learning objects.
So far, I have pointed out a great many virtues of the Semantic Web, especially
assumptions and solutions. Nevertheless, practical experience has proved that the
Semantic Web is far from changing the vision of the Internet; it needs some help to
Some society-scale applications are required. The above mentioned agents are
well. To make shared data real, some more advanced collaborative applications are
required.
20
2.3 Web 2.0
Although Web 2.0 is currently a very popular term, it is dicult to give its precise
denition. Even Tim Berners-Lee, the inventor of the Internet has dicult in doing
that:
space, and I think Web 2.0 is of course a piece of jargon, nobody even
knows what it means. If Web 2.0 for you is blogs and wikis, then that
is people to people. But that was what the Web was supposed to be all
along.
In short, Web 2.0 is the Web where people meet, collaborate and share anything
that is popular by using social software applications. The term refers to second gen-
11 12 13 14 15 16
like del.icio.us , Flickr , Skype , Wikipedia , last.fm , Technorati .
Web 2.0 applications derive from new techniques such as rich internet applica-
tions (RIA), Asynchronous JavaScript and XML (AJAX), semantically valid Ex-
Syndication and aggregation of data in RSS or Atom, clean and meaningful URLs.
A user of Web 2.0 must feel as if he/she used traditional desktop applications to
In accordance with Tim O'Reilly [43], the meaning of Web 2.0 can be presented
by contrasting the traditional Web with new Web 2.0 in Table 2.2.
11 http://del.icio.us/
12 http://www.ickr.com/
13 http://www.skype.com/
14 http://en.wikipedia.org/
15 http://www.last.fm/
16 http://www.technorati.com/
21
Table 2.1: New trends in the Web (concept: [43]).
Explorer
sites
Systems
onomies)
2.3.1 AJAX
AJAX is a web development technique to create web applications as if they were
desktop ones. The aim is to exchange only small amounts of data with a server; this
should be performed behind the scenes. No longer should entire page be (re)loaded.
17
One of the rst Web 2.0 applications was Google Maps , a set of interactive
maps of the world. One can watch diverse views of the world, change the way the
views are displayed and personalize them. There is a constant dialog between the
2.3.2 Democracy
Democracy in Web 2.0 is very important [12]. Users, often amateurs, collaborate
and share anything that is popular. Without users, many Web 2.0 application would
and sharing them with other users; users collaborate and share information. It is
similar with Wikipedia, a free encyclopedia. Wikipedians can write new articles,
edit existing ones. Yet, all Wikipedia users are anxious about the quality of their
17 http://maps.google.com/
22
18
encyclopedia. There are even Web 2.0 news services, like Reddit . It is a set of news
items and articles which were found interesting by other people, and consequently
added there.
Aforementioned examples expose the importance of the Internet users. Web 2.0
exists and is becoming more and more popular since users try to evolve, expand and
improve it. One can share anything and in return is allowed to use others' products.
which brings about online communities social networks (see Fig. 2.4). The main
reason why a user belongs to social networks is the desire to share and meet oth-
Networks have diverse sizes. In a small, tight one, there are few people who form
a kind of a private area. However, there can also be a lot of participants with loose
connections (weak ties). From the collaboration point of view, the latter mode is
to have connections with other networks than with only one. However, unlimited
access to information exchange can involve some risk; there is a possibility that a
limit the possibility of reaching poor data, rating and annotating shared resources
were introduced.
18 http://reddit.com/
23
Figure 2.4: An example of a social network
Scale-free network
In a scale-free network, there are many very connected nodes (hubs) which have
that the ratio of those well connected hubs to the number of nodes in the rest of the
2.3.4 Tagging
A tag is a label associated with or assigned to a piece of information such as a
web page, a photo or a movie. It is a keyword, which les and classies resources.
Popular services that use tags are del.icio.us and Flickr. The former uses tags to
label favorite web pages, while the latter employ them to marker photos.
more popular tags from less popular ones; the former are written in bigger font than
the latter. Popularity is seen either by the number of items that have been given a
tag (like at Flickr) or the number of times the tag has been applied to a single item
24
(like at last.fm). Clicking on a tag from the cloud shows the list of resources which
19
The TagCommons project is aimed at creating ways to share and interoperate
over tagging data. The idea of the project is to benet from rich social tagging
descriptions.
2.3.5 Mashups
A mashup is a web page which oers a number of online services from various
20 21
sources. It allows using existing applications like Google Maps , Google Calendar
22
or Yahoo! UI Library (YUI) . It is possible due to access to their public APIs, Web
have impacted the Internet development. The former is a low-level solution whose
Both these standards can be overlapped to make even better benets. By involv-
ing Web 2.0 techniques into the Semantic Web solutions, we get Semantic Web 2.0
applications which not only act as desktop ones, with a ne looking user interface,
19 http://tagcommons.org/
20 http://maps.googe.com/
21 http://www.google.com/calendar
22 http://developer.yahoo.com/yui/
25
Table 2.2: Metamorphosis of the Web (concept: [21]).
websites
ment Systems
dumpnd.com Engine
Search
tals
works Networks
tic Information
Spaces
26
The most traditional states that the metadata is created by dedicated profession-
als; it has a form of catalog records created by complying to complexed rules which
are not understood by laymen. Moreover, organizing and developing the catalogs is
Author-created metadata approach assumes that authors are responsible for sup-
plying their work with metadata since they know them best. It helps with the scal-
ability problem, but still users are only the recipients and do not have the inuence
on the data.
User dened metadata solves the scalability problem and involves users in the
Folksonomies
Tags (see Sec. 2.3.4) arose along with Web 2.0; they played the role of taxonomies
The Semantic Web 2.0 has set users free from using predened vocabularies. It
gave one more freedom in that eld by introducing folksonomies. One of the most
popular services that involve folksonomies are del.icio.us and Flickr. As the name
an open-ended labeling system with low entry costs that enables Internet users to
categorize content using tags. Tags in a folksonomy are metadata about categorized
in the Semantic Web 2.0 depiction. At the same time, they bring into the Semantic
Web 2.0 the whole potential of Web 2.0. That is why they also appeared in table 2.2.
As I stated earlier in this paper (see Sec. 7.2.2), the Semantic Web is about
27
a tag can be interlinked with others related tags. The relationship is established by
analyzing the URLs. Related tags can be used to broaden or widen the range of
found information and to nd information somehow associated with current tag [35].
The most important limitation of folksonomies is the fact that there is no scope
information and systematic guidelines, which results in ambiguity. A tag can include
Then how to reach the most appropriate information about Macintosh? By searching
and spaces, as usually multiple words are not allowed. Finally, the problem of plural
and singular forms and conjugated words appears. There is no strict rules which
form should be chosen. As I said, a user is given a free hand in tag's name selection,
so there is a risk of existing tags which are senseless or a few versions of the tag that
However, these problems can be solved in simple ways. One way is to educate
users to add better tags. They should be advised to use plurals in basic forms.
They also shall be taught not to make spelling errors and avoid personal tags (e.g.
mydog) that are meaningless to the community. Then, tagging systems should
catch misspelled and not recommended words and give users advice at run-time [34].
There are some initiatives who try to learn how to order tags in folksonomies such
23
as taga.licio.us .
28
Chapter 3
The goal of this Master's Thesis is to employ Social Semantic Information Sources
for eLearning; that is why it is necessary to understand what the Semantic Web 2.0
is and how it can be used for eLearning. So far, I have introduced those technologies
(see Chapter 2). In this Chapter, I explain the idea of Social Semantic Information
Sources (see Fig. 3.1) and make a review of their most popular examples (semantic
blogs, semantic wiki, and Social Semantic Digital Library). Using that information,
I dene a common model of SSIS and propose a consistent way of its description.
Then, I present eLearning 2.0, a new approach which tracks informal learning so
leaded by one to publish their opinions, thoughts and web links [42]. Although
most blogs are textual, some focus on photographs (photoblog), videos (vlog), audio
(podcasting). In general, blogs are part of the wide network of social media.
29
Figure 3.1: Location of SSIS in the Web (gure concept: [21])
He/she can be also a beginning writer who looks for the audience. Finally, a blogger
master of science. Anyway, whoever the blogger is, his or her main reason to create
Blogs are updated by habitually writing new entries (posts). They are usually
with headlines, hyperlinks and summary using RSS or Atom formats. This allows
1
Visitors can read posts and annotate them. Due to Technorati , blogs are pow-
erful since they allow millions of people to easily publish and share their ideas, and
millions more to read and respond. They engage the writer and readers in an open
1 http://www.technorati.com/
30
Blog as a tool in eLearning
2
According to Technorati statistics from August 2006 , there were fty million blogs
in the Internet, and their number had been doubling every six months or so since
November 2002. At that stage, the number was one hundred times bigger than it
had been three years earlier. That day, about 175 000 new blogs and about 1.6
million posts were created each day. These numbers demonstrate the potential of
blogs.
Being so popular, blogs can support the learning process. Yet, not only do they
remove the technical barriers to writing and publishing online, but, thanks to their
According to O'Hear [41], Will Richardson was one pioneering educational blog-
3
gers. By using Manila , a blog software, he encouraged his English literature stu-
dents to publish a reader's guide to the book The Secret Life of Bees. The author
what the students have written. This way, a small community of people interested
Will Richardson succeeded since he relied on the main concepts of weblogs, the
power of collaboration, which can be used in eLearning. Students can use weblogs
for exchanging their experience, publishing their notes or gained knowledge. Yet,
other students or even teachers can write annotations to express their feelings about
Blogs would not be so helpful in studying if it were not for exposing machine-readable
blog. Syndication services generate feeds, which are portions of information about
changes to a blog. The most popular standards [22] are Really Simple Syndication
0.92, Atom and the RDF Site Summary 1.0 (it fully supports RDF).
31
Semantics for blogs
4
There is a large number of blogging publishing services available, such as Blogger or
5
WordPress . That services provide a wide range of tools for creating and managing
blogs. However, they lack from semantic description of the content: topic of the
posts, their content or connection with other posts, perhaps from others blogs.
To make a blog also machine readable, rich metadata for its content must be
itself or its parts (posts, comments, hyperlinks) and relations between them
The data can be either mixed in a post (seen by a reader) or added in a hidden,
process the metadata; machines can nd connections between one's blog posts and
other blogs, quickly obtain information about a post's author or a described event.
6
Semantically-Interlinked Online Communities (see Sec. 3.2.1) project (SIOC)
delivers a plug-in (SIOC Exporter) for a few most popular blogging platforms:
7
• WordPress one of the most popular blogging tool
8
• DotClear blogging platform used mostly in French
9
• Drupal content management platform for blogs and fora
10
• b2evolution
4 http://www.blogger.com/
5 http://wordpress.org/
6 http://sioc-project.org/
7 http://wordpress.org
8 http://www.dotclear.net/
9 http://drupal.org/
10 http://b2evolution.net/
32
SIOC plug-in adds additional information about the site, a hyperlink to extract
RDF document for the whole blog or its posts. These metadata describe the blog
who hosts a post, and gives some information specic for a blog post, the author,
the topic, external links, the date of creation, the content of the post, etc. To learn
11 12
popular wiki engines are MediaWiki (Wikipedia is based on it) and MoinMoin
13
Wiki .
programming language pattern group. A wiki has a simple text syntax for creating
new pages. Users can easily create the contents (ad hoc) and edit existing informa-
tion using a web browser. They do not have to even be logged in to do that [16].
Wiki provides easy and deep linking by using names. In other words, if a wiki page
contains a word or phrase which is the topic of another page in that domain it is
navigation; moreover, this works for pages which do not exist yet.
As everyone is allowed to interfere what other see, the contents must be checked
only then the information a wiki provides is reliable. Each community mem-
ber can be a moderator. Reliability is achieved with versioning and di features.
Each wiki page has a history of changes which can be easily tracked by compar-
ing dierences between them. Thus, in case of occurrence of errors changes can
be easily reverted. All the aforementioned features make wikis a powerful tool for
collaborative work.
There can be many reasons for creating a wiki. Wikipedia is the most popular
encyclopedia based on a wiki engine in the Internet. Wikis can also be used to
14
manage the open source software documentation, like Jakarta does. It is convenient
11 http://www.mediawiki.org
12 http://wikipedia.org/
13 http://moinmoin.wikiwikiweb.de/
14 http://jakarta.apache.org/
33
to use a wiki as a personal information management system. Finally, it is commonly
15
used as a discussion platform in companies' intranets (see TWiki ).
Wikis seem to be a good way of making people cooperate and a powerful informal
source of knowledge. To better use their potential, the structure and the content
of wiki pages shall be modeled by using semantic description. Semantic wikis allow
user to add additional metadata (semantic descriptions) for described concepts. This
data shall mark the place of its occurrence so that the system is capable of extract
relevant data without understanding the rest of the text. As a result, it helps to
organize, search, browse, share, and annotate the wiki's content. Semantics enhance
the searching process; it is not limited to only keyword based searching. It introduces
For instance, a wiki with articles about rock songs could annotate these pages
with little pieces of additional data (written in RDF), such as this song was made by
Red Hot Chili Peppers , or This song was published in 2000 . A user does not have
to know RDF syntax to annotate. Thus, the wiki can reason on the annotations
16
• Semantic MediaWiki extension of MediaWiki (see Sec. 3.1.2)
17
• IkeWiki web-based wiki (prototype)
18 19
• Makna its engine implementation is based on Janne Jalkanen's JSPWiki ;
20
it uses Jena , the Semantic Web engine used by HP
21
• SemperWiki a semantic personal wiki developed for the Gnome desk-
top [44]
15 http://twiki.org/
16 http://meta.wikimedia.org/wiki/Semantic_MediaWiki
17 http://ikewiki.salzburgresearch.at/
18 http://www.apps.ag-nbi.de/makna/
19 http://www.jspwiki.org/
20 http://jena.sourceforge.net/
21 http://www.semperwiki.org/
34
There are three ontologies designed to deal with wikis:
22
• WikiOnt aims at integrating Wikipedia (and by extension other
23
• SWIFT
Semantic MediaWiki
MediaWiki is one of the most popular wiki engines. The most known wiki,
Wikipedia, is based on it. However, MediaWiki does not support the Semantic
Web demands. Although, the HTML code is to some extent semantic, there is no
To make a MediaWiki-like wiki a semantic one, one can instal the Semantic Medi-
aWiki extension [16]. Its goal is to make important parts of MediaWikis knowledge
machine processable with as little eort as possible. For that reason, there are
instructions on how to improve typed links, attributes and types, and introduced
semantic templates.
Typed links are treated like semantic relations between two concepts described
24
link. Let us take the main page of Corrib Clan Wiki as an example. On this
site, there is information about Corrib Clan like projects developed by their mem-
bers and the supervisors. There are a number of typed links on that page. The
hyperlink to the article about Didaskon not only gives the page location but also in-
This template is built from two main parts. First part (the expression before ::)
describes the relation; the second part (after ::) is a hyperlink to the article within
the wiki. So, this example says that Didaskon is a subproject of Corrib.
22 http://sw.deri.org/2005/04/wikipedia/wikiont.html
23 http://ontoware.org/projects/swift/
24 http://wiki.corrib.org/
35
Besides typed links, Semantic MediaWiki introduces better way to manage at-
tributes of concepts. Since each typed link connects two wiki pages, not all informa-
tion can be stored as a relation. For that reason, one uses attributes. On the above
mentioned Corrib Clan Wiki main page there are a few attributes as well. For exam-
Clan is supervised by Sebastian Kruk. The dierence between a typed link and an
cessable way. A set of relations and attributes is situated on the bottom of the
article page. But machines are not obliged to scrape the content of the page.
Semantic MediaWiki allows extracting these annotations with an RDF feed. For
DBpedia.org
25
DBpedia.org is a project that aims at extracting structured information from
Wikipedia and to make this information available on the Web. The information
Also, DBpedia allows us to ask queries against Wikipedia and to link other datasets
them. Finally, I describe JeromeDL, the rst Social Semantic Digital Library. All
in all, I point out the importance of Social semantic Digital Libraries to learning
process.
25 http://dbpedia.org/
36
Digital Libraries
ers and the Internet expansion, brought in digital libraries [6]. In a digital library,
resources are machine readable and full-text index improves searching. Resources
There were some quite innovative methods adapted to digital library commu-
library would be impossible if they were not suciently described; electronic anno-
tations play an important role since they bring more information about books. The
most popular description formats are MARC21, BibTeX and Dublin Core.
Besides searching and reading, users are allowed to download resources for further
Digital libraries also handle access rights. Some resources can be hidden from users
Digital libraries already have controlled vocabulary and taxonomies. All of them
even have metadata in place. In semantic digital libraries, rich and extensive se-
mantic annotations (metadata) make resources accessible not only with machines
The metadata is modeled with RDF (see Sec. 2.2.2). Searching is more ecient
Artifacts, which are from cultural heritage domain, are arranged in hierarchical
structure and can be stored internally or in any other place by keeping their
37
references. It also supports various metadata schemas dened in OWL-DL.
Bibliographic resources are described with RDF. Again records can be queried
in SPARQL.
metadata, system metadata and behaviors, that are code objects providing
vironments) is developed by W3C, HP, MIT Libraries, and MIT's Lab for
Computer Science.
SIMILE project provides some tools for metadata managers and common end-
users. They all deal with RDF: allow to extract XML and HTML les, inspect
and edit RDF les. SIMILE extends and leverages DSpace and makes library
So far, I have described innovative semantic digital libraries. I have presented how
the Semantic Web improves their features. The potential of semantic digital libraries
can be even more improved by applying Web 2.0 abilities. A semantic digital library
can give some space for collaboration. Users can leave a trace by making annotations
and evaluations of the resources. By supporting Web 2.0 collaboration aspects (com-
ments, blogs, shared bookmarks, tagging, etc.), a semantic digital library becomes
38
JeromeDL29 [49] is developed at Digital Enterprise Research Institute, Gal-
30 31
way (DERI) with collaboration from Gda«sk University of Technology by a
group of MSc and PhD students, including myself. It has 2-layer metadata en-
richment. The lower level, MarcOnt Mediation Service, supports legacy metadata
(DublinCore [9], BibTeX and MARC21 [1]), which allows interoperability with al-
The upper level is community oriented [31]; a community of users can interact
Filtering (SSCF) [50]. Users can evaluate and annotate resources. Users' data
share this information with other users, base on their prole, which is managed by
search engine. Users can form queries even in natural language (NL) by using query
templates.
There are seven ontologies supported by JeromeDL and they can be grouped as
follows:
• MarcOnt Mediation Service [48, 47, 26] metadata about bibliographic re-
sources
DublinCore
BibTeX
MARC21
29 http://jeromedl.org/
30 http://www.deri.ie/
31 http://www.pg.gda.pl/
39
MarcOnt
Among other features, JeromeDL also allows exporting the description of its
analysis of SSIS asserted myself that there are a few main concepts regarding SSIS:
The last point suggests the potential of SSIS. Being collaboration-minded, online
community sites, like blogs, wikis, bookmarks sharing systems, allow users to create a
network where they can feel free to band together: share ideas and opinions, publish
links and works and comment them; any resource can be annotated. Consequently,
plenty of relevant information can be extracted; this data can support the learning
process. For instance, it can be served as an additional material to read. All in all,
The main problem of online communities is that they are dispersed over the
solutions allow mainly text based searching, so a user must browse many web pages
The Semantic Web assumes rich description of the resources; its main postulate
says semantic annotations make the content readable by machines, which allows
40
Figure 3.2: Online communities overview (from [4]).
3.2.1 SIOC
32
SIOC (Semantically-Interlinked Online Communities) is an initiative that is sup-
posed to overcome the above mentioned problem [20]; its goal is to interconnect
enclosed links, the creation time, connection with other web pages.
The core of the SIOC framework is the SIOC ontology which is based on RDF
32 http://sioc-project.org/
41
written by an author, has a topic, a content, external links, etc.
importing and exporting SIOC data in dierent vocabularies. This manner, the
amount of existing available data can be controlled. Also, SIOC makes cross-site
queries and topic related search on sites with SIOC metadata more ecient [4]. I
have already written about SIOC plug-in for a few blogging platforms (see Sec. 3.1.1).
SIOC ontology is still developed; recently, its authors have been trying to apply
modelling wikis, image galleries, event calendars, address books, audio and video
Learning Management Systems (LMSs) that provide online courses (see Sec. 2.1).
42
Blackboard, and Desire2Learn [8]. To recap, LMS organizes the learning content in
This great approach lacks from many limitations, though. The main problem of
current LMSs is that they deliver courses prepared for a generic student. They are
all. However, the learning path shall be adaptable, created dynamically. Also, LMSs
focus on a small group of students, for instance a group of studemts in a class; they
do not allow a broader community. Moreover, students should benet from not only
their repository (formal learning), but also use collected learning material widely
eLearning 2.0 has emerged from Web 2.0 developments. According to DTI Global
• diversity of content and media Web 2.0 services (blogs, wikis, multimedia
• informal learning
Blogs were one of the rst Web 2.0 services used in the newer eLearning approach.
Students' blog posts are often about something from their own range of interests,
rather than on a course topic or assigned project. Students run blogs and read others'
blogs; consequently, they create a social network with loads of useful data [36]. Then,
wikis, RSS, podcasting services, and others Web 2.0 platforms have emerged. All in
all, the number of available resources has increased, which occurs as a problem for
is possible due to ontologies (see Sec. 7.2.2). Thus, machines can produce intelligent
43
responses for unforeseen situations. But the real power of the Semantic Web can be
realized when heterogeneous data from diverse environments are collected, processed
and sent for further use [33]. Ontologies organize learning material around good
semantic annotations of learning objects. Also, they can be used to describe user
proles in order to compose the best course for him/her basing on semantic queries.
of LOs. We have many LMSs and most of them describe LOs in their specic way.
ing content used by other LMS and create common searchable content and content
(see Sec. 2.1.2) which is a collection of standards and specications adapted from
However, SCORM has introduced its own XML formats and methodologies [3]. One
of the standards that underly SCORM is LOM; its goal is to provide rich descrip-
tion of learning material (see Sec. 2.1.2). Since LOM is very accurate, many LMSs
support it. This way, exchanging LOs between them is, to some extent, facilitated.
learning content is still not much machine-readable. By bringing the Semantic Web
to eLearning, it is easier to integrate learning material with other material and dene
learning material [3, 38]. Thus, it is more likely to benet from both formal and
At the moment, considerable eort is put into research in the Semantic Web
and eLearning. There is a number of the Semantic Web educational services and
projects:
web pages that contain semantic content. AQUA uses ontologies for rening
44
initial queries, similarity algorithm, and reasoning process [55].
essays, which facilitates writing essays that really answer the essay question.
tologies [38].
Elena33 denes a smart space for eLearning on top of Edutella [39] peer-to-peer
learning resource repositories. It uses SOAP based Web Services which are
work and all its resources are described in RDF. This allows running ecient
mapping, mediation and clustering resources and the metadata for them [39].
3.3.2 Didaskon
34 35
Didaskon is a project developed in the Digital Enterprise Research Institute
the eLearning eld. Its main goal is to deliver a framework for assembling an on-
services [52].
LOM ontology. LOs are composed into a learning path for a specic student. Along
with formal Learning Objects, Didaskon also uses the potential of Social Semantic
34 http://didaskon.corrib.org/
35 http://deri.ie/
45
creates LOs from data harvested from SSIS. Consequently, a user gets a course path
conditions regarding a user. Each user is described with FOAF ontology [7]. Basing
for his/her needs. Moreover, the system allows more scalable helper features for
students supervision.
Again, used ontologies link user needs and the characteristics of the learning ma-
terial. Produced curriculum not only reects user requirements, but also introduces
46
Chapter 4
Each project is burdened with some risk; when things go wrong it can end up
failing to reach the initial assumptions. Therefore, the design process is very crucial.
Besides dening business goals, I also must identify possible problems and risks.
In this chapter, I introduce existing tools for capturing informal learning and
describe the scope of my project. I dene the functional and non-functional require-
ments for the system and use cases. Then, I introduce the its architecture: the
main components, classes, and Web Services specication. All information gathered
line resource or metadata for them. I describe their features, and point out their
limitations.
for RDF data either in the content of the resource with the specied URL or in docu-
ments this resource links to. If such data is found, it is saved to the shared repository.
1 http://pingthesemanticweb.com/
47
PingtheSemanticWeb.com supports FOAF, SIOC, and DOAP ontologies, and other
RDF documents.
The pinging feature is invoked either by typing a URL on the service's home page
2
benets from Semantic Radar , an add-on for Firefox web browser; whenever Se-
about that fact so it can be added to the repository. Software agents can request
the service for a list of stored RDF documents and use that information for crawling
SIMILE Project
3
SIMILE Project Semantic Interoperability of Metadata and Information in unlike
Piggy Bank, an add-on for Firefox, changes the browser into a mashup platform,
by allowing to capture metadata for online resources and mix them together. Col-
lected data can be stored locally, tagged, searched, and browsed. Piggy Bank can
capture RDF documents to whom a web page links and from any web pages that are
metadata for, also, non-semantic web pages. It is written in another SIMILE tool,
Solvent.
Zotero
4
Zotero is an add-on for Firefox web browser. It helps with collecting, manag-
ing, and citing research material, mainly bibliographic resources. Zotero extracts
RDF injected into XHTML documents; it works with a few standards and microfor-
mats [24]: embedded RDF, COinS, Dublin Core [9], and MARC [1]. Zotero informs
2 http://sioc-project.org/refox/
3 http://simile.mit.edu/
4 http://www.zotero.org/
48
a user it has discovered some mark up by showing a special button in the browser
A user can easily edit the data saved by Zotero and append additional informa-
tion, such as notes, tags, and related les. Moreover, Zotero can be integrated with
Microsoft Word and WordPress. Captured data can be searched and browsed both
4.1.2 Limitations
All the above mentioned tools are good metadata harvesters. However, they work
annotations for online resources in a shared space. This information can be used
for instance by crawlers while searching for specic piece of data. But, PingtheSe-
manticWeb.com does not come up with the possibility to browse stored data besides
viewing raw RDF documents, which is unacceptable for a common user. Also, it
Zotero is a powerful tool for researchers and students because it facilitates biblio-
about books and articles, search and cite them. However, it only reads embedded
RDF; there is no support for pure RDF data which can pass more knowledge.
Piggy Bank is capable of reading whole RDF documents that a web page links
to. Although it does not support non-semantic web pages itself, it is possible to
write screen scrapers that can do that. In spite of that, it has little support for
icant characteristics that such a tool must be distinctive with. Not only should it
work with semantic sources of information but also it must operate on non-semantic
web pages, like Wikipedia. It must be easy to extend it so that it supports more
types of websites. Then, a user should be supplied by supportive tools for data
49
Also, I have discovered that captured data can considerably boost informal learn-
ing; it can be used in new eLearning frameworks that use both learning material
layer for Didaskon system (see Sec. 3.3.2) which works as its extension. The system
provides Web Services for harvesting data from SSIS and providing them in a form
of informal Learning Objects (see Fig. 4.1). Data delivered by the system must be
described with a common object model so that Didaskon can easily reason on it.
Because the system is supposed to collect data, I have named it IKHarvester, from
Informal Knowledge Harvester.
In the picture of the system scope (see Fig. 4.1), you can see SSIS that pro-
IKHarvester collects the metadata and stores it in the repository of informal knowl-
edge (it's not in the picture). The collection of these metadata is well described
learning material, basing on what it needs during the composition. The description
They gave me a view on what and how the developed system should act.
readable and correct way. Table 4.1 is a template for describing the requirements.
50
Figure 4.1: System scope
51
Table 4.1: Requirement description template
Id XYY Priority
Title
Description
Source
Related req.
requirement
Functional requirements
system should do. Precisely described stakeholders' needs are a rst step to nish a
Description IKHarvester should be able to provide a list all informal LOs stored in
Source Didaskon
52
Id F02 Priority Optional
Title Deliver a list of informal LOs that have changed since a given date
changed since a given date. This aims at avoiding the situation, where
Source Didaskon
Description It is one of the basic features of IKHarvester. Metadata for SSIS resources
Source Didaskon
The content of informal LOs must not be stored in the repository; it must
Source Didaskon
53
Id F05 Priority Crucial
users) or by the administrator. Regarding SSIS, one must know the URL
• JeromeDL
Source Didaskon
Description If a SSIS resource from which the LO was created no longer exists, it shall
be removed from the informal knowledge repository. The data should not
Source Didaskon
54
Id F07 Priority Crucial
Using SOA assures eciency and easy access to the system features. All
Source Didaskon
ject model suitable for eLearning. The structure of the model must be
Source Didaskon
Related req.
55
Id F10 Priority Crucial
Description IKHarvester must be able to collect data from semantic blogs which are
supported with SIOC plug-in. The data should be obtained by using the
SIOC exporter.
Source Didaskon
Description IKHarvester must be able to collect data from both semantic and non-
Source Didaskon
Description IKHarvester must be able to collect data from JeromeDL, the Social Se-
mantic Digital Library. The data shall be produced by the RDF exporter.
Source Didaskon
Description In case RDF extractors supplies IKHarvester with not relevant data, it
must be ltered.
Source Didaskon
56
Non-functional requirements
Title Reliability
reasoning on them.
Title Interoperability
neous networks. Yet, it is supposed to collect data from SSIS which carry
diverse information.
57
Id N03 Priority Required
Title Extensibility
Description The system should be developed in a way that allows making im-
Also, it must be easily extended with plug-ins (see Fig. 4.6) that deal
with other types of SSIS (wiki based on dierent engine, other digital
Source Didaskon
Title Eciency
Services.
However, this is not crucial; the communication takes place within the
Internet so there might happen some periods of time the services are
Related req.
Title Portability
Source Didaskon
58
Id N06 Priority Required
Title Stability
Source Didaskon
Title Safety
cidentally.
Title Security
tivities and eorts that aim at lowering its eciency and the quality of
work.
Description During the development, only open source software and tools should be
Source Didaskon
Related req.
59
Id N10 Priority Crucial
Description All the documents and other products (like software) created during the
project will have a version number which will allow to track changes in
an easy way. There is need for a tool like SVN for version controlling.
Source Didaskon
Source Didaskon
• collecting data from SSIS and storing it in the informal knowledge repository
More detailed use cases are depicted in Fig. 4.2; there are basic functionalities
Actors
Below, there is a description of the actor that uses functionalities provided by IKHar-
vester.
60
Figure 4.2: Use Case diagram
61
Id A01
Title Client
Web Services, we expect more than one actors that can use
it.
Related actors
Use cases
A use case is an occurrence that takes place while the system works. Each use case
is initiated either by the actor's activity or by another use case. It is very important
to provide a use case scenario that was created after system requirements analysis.
Use cases tell more precisely about what can happen to the system while it works.
Id UC01
Actors A01
62
Id UC02
Actors A01
Id UC03
Description Providing the content of a LO. The content can be txt, HTML
Actors A01
Exceptional occurrence Unsuccessful connection to the resource with given URL; ei-
no longer online.
63
Id UC04
new LOs. The actor must hold the URL of the resource which
should be added.
Actors A01
Id UC05
Actors A01
Initial occurrence Claim for updating metadata of resource that has changed.
64
Id UC06
Actors A01
Id UC07
Actors A01
Exceptional occurrence
Related use cases UC01, UC02, UC03, UC04, UC05, UC06, UC08, UC09
Id UC08
Actors A01
Initial occurrence Claim for a list of all LOs. If adding date is specied, the list
will contain only those LOs which are added since then.
Exceptional occurrence
65
Id UC09
Actors A01
Exceptional occurrence
Id UC10
performed.
Actors A01
cic LO.
Id UC11
Description Semantic web pages allow extracting metadata for their re-
Actors A01
66
Id UC12
Actors A01
Id UC13
Actors A01
Exceptional occurrence
Id UC14
Actors A01
Exceptional occurrence
67
Id UC15
knowledge repository.
Actors A01
Id UC16
Actors A01
Exceptional occurrence
Id UC17
Actors A01
68
4.3 System design
By now, I have pointed out and described the system requirements. Also, I dened
report more precisely how it works, give some details on what is going on inside the
system.
work to fulll the service consumer's needs. The services are independent; they do
not rely on the context and state of other services. The architecture demands using
interfaces based on the Internet protocols like HTTP, FTP, SMTP; all messages,
except from binary data attachments, must be described in XML. There are two
SOAP
SOAP (Simple Object Access Protocol) Web Services are very popular nowadays.
SOAP is a protocol for transferring data between the source and the destination
the core (stubs) of the client code which can call SOAP Web Services. Messages
sent with SOAP are wrapped by an envelope; within it, there is the content (body)
REST
REST (REpresenational State Browser) Web Services are based on the concept of a
In fact, it is used commonly nowadays, in the World Wide Web and Web 2.0 [10, 45].
69
• GET for obtaining a stateless representation of a resource
Employing SOA
developed a group of Web Services. Thus, IKHarvester is independent from the LMS,
Although, both SOAP and REST have pros and cons, I have used REST since
diagram (see Fig 4.3) depicts a high level architecture of the system.
IKHarvester The core of the system. It is responsible for integrating its two
subcomponents:
Jericho HTML Parser5 is a Java library for web pages scraping. It allows anal-
(LGPL).
70
Figure 4.3: Component diagram
manage data compatible with LOM standard; it allows creating and exporting
material.
71
interface for connection with RDF storages. Consequently, informal knowledge
4.3.3 Classes
Fig. 4.4 and Fig. 4.5 present the simplied class diagram for the IKHarvester system.
It covers a number of classes with their most important attributes and methods. The
those that are harvest data from resources of dierent types. In current ver-
posts that use WordPress engine. Current version of IKHarvester tracks only
hosted on Blogger.
JeromeDL resources.
HarvestingResults enum class that dene how harvesting ends (for instance,
72
MediaWikiScraper used for scraping web pages with wiki articles in order
to nd crucial metadata. Its methods employ Jericho HTML Parser for that
purposes.
BloggerScraper used for scraping blog posts hosted on Blogger in order to nd
crucial metadata. Its methods employ Jericho HTML Parser for that purposes.
DataProvider an interface that dene two methods for providing data stored
DataProviderImpl implements the above mentioned interface and calls its sub-
classes responsible for providing data that has been collected from dierent
type of resources. Also, it delivers methods for obtaining the list of learning
frameworks.
knowledge repository metadata for wiki articles and providing them to eLearn-
ing frameworks.
NS is a set of a few classes that dene namespaces for ontologies used for describing
blog posts, wiki articles and JeromeDL resources: NOTITIOUS, FOAF, XFOAF,
MarcOnt, XMarcOnt, JeromeDL, and SIOC.
73
Util contains a set of helper methods
eters
query
new blades modules for managing other types of resources (see Fig. 4.6). To do
so, a programmer must learn the class diagram (see Fig. 4.4 and Fig. 4.5).
Current version of IKHarvester captures metadata from SSIS with the following
classes:
• WordPressDataHarvester
74
Figure 4.4: Class diagram (part #1)
75
Figure 4.5: Class diagram (part #2)
76
• BlogerDataHarvester
• MediaWikiDataHarvester
• JeromeDLDataHarvester
Each of the above mentioned classes works with a specic type of SSIS. It is im-
portant wheter it captures information from, for example, a post hosted on Blogger
or one that runs on WordPress engine because data is exposed dierently. Conse-
qently, to provide support for example for a new type of blog posts or wiki articles,
Curently, there are three classes for retrieving metada for captured resources from
• BlogPostDataProvider
• WikiArticleDataProvider
• DLResourceDataProvider
All the above mentioned classes extend DataProviderImpl that implements the
DataProvider interface. Those three classes support three types of SSIS: blogs, wikis
common object model, disregarding the fact whether it is, for instance, a post from
ing IKHarvester with a module that captures data from another type of blogs, wikis
and digital libraries does not require implementation of a new providing module.
However, if such new module is required, a new class that extends the DataProvider-
Impl must be added.
data from SSIS (in general online communities) and saving them to the infor-
77
Figure 4.6: Blades for dierent SSIS types
mal knowledge repository. The repository stores these metadata in RDF triples
from which Learning Objects described according to LOM standard are created and
delivered to Didaskon.
(predicates), and LOM attributes was crucial for further development. There are
plenty of properties that describe a resource. Semantic RDF feeds are very helpful
since they provide mapping from attributes to predicates. For they give a lot of
In this Section, I describe the attributes mappings for each resource type IKHar-
vester supports at the moment (blog posts, wiki articles, and JeromeDL resources).
Blog Posts
Metadata for blog posts is delivered by SIOC data exporters. A blog that supports
SIOC, contains some additional information in the meta tag (inside head tag) in the
HTML code. For my blog, which is available at http://dobrzanski.net, it looks
78
as follows:
The href attribute value is the URL of the RDF representation of the data on
current page. Its value changes during browsing the blog; it is always up to date,
ready to produce RDF output. In general, the output consists of some information
Having the URL of SIOC data for a post, IKHarvester uses the exporter to obtain
When it is asked to deliver data, it collects the RDF statements from the repos-
itory and transform them so they describe the post in a way compatible with LOM
standard. Since some of the metadata is not crucial for eLearning purposes, it is
In the following Table, I present how post attributes (rst column) are mapped
to SIOC ontology predicates (second column) and then to LOM attributes (third
column). Some of the LOM attributes are set to default values, which cannot be
collected from SIOC exporter output. Attributes labeled with an asterisk (*) can
- sioc:Post Educational.LearningResourceType=BlogPost
General.Identier.Catalog=URI &
General.Identier.Entry &
Meta-Metadata.Identier.Catalog=URI &
Meta-Metadata.Identier.Entry
79
creator sioc:has_creator Lifecycle.Contribute.Role=Author &
Meta-Metadata.Contribute.Role=Author &
Meta-Metadata.Contribute.Date=Date
Educational.Description &
Classication.Description
Classication.Keyword
Annotation.Date=Date &
Annotation.Description=Content
Relation.Resource.Identier.Catalog=URI &
Relation.Resource.Identier.Entry &
Relation.Resource.Description=references
Educational.Language &
Meta-Metadata.Language
- - Educational.InteractivityType=expositive
- - Educational.InteractivityLevel=medium
- - Educational.SemanticDensity=medium
- - Educational.IntendedEndUserRole=learner
- - Educational.Context=school &
Educational.Context=training &
Educational.Context=other
- - Educational.Diculty=easy
- - Rights.Cost=no
- - Rights.CopyrightAndOtherRestrictions=
no
- - General.Structure=atomic
80
- - General.AggregationLevel=1
- - MetaMetadata.MetadataSchema=LOMv1.0
- - Technical.Requirement.OrComposite. . .
.Type=operating system
.Name=multi-os
.Type=browser
.Name=any
- - LifeCycle.Status=revised
Wiki Articles
IKHarvester must collect data from semantic and non-semantic wikis which are
the concept described in an article from a semantic wiki can be obtained by using
RDF feed. However, harvesting should be performed also for non-semantic wikis, like
Wikipedia. It turns out there is quite a lot of semantics in the HTML code; dierent
sections like titles, content and categories are put inside sections with formalized
identiers. Thus, scraping the page results in a lot of crucial information. In fact, I
In the following Table, I present the way of mapping the attributes of wiki
articles (rst column) to SIOC ontology predicates (second column) and then to
LOM attributes (third column). Some of the LOM attributes are set to default
(*) can occur more than one time; those with two asterisks (**) are served by RDF
- sioc:WikiArticle Educational.LearningResourceType=
WikiArticle
81
URI - Technical.Location &
General.Identier.Catalog=URI &
General.Identier.Entry &
Meta-Metadata.Identier.Catalog=URI &
Meta-Metadata.Identier.Entry
Educational.Description &
Classication.Description
Classication.Keyword
Relation.Resource.Identier.Catalog=URI &
Relation.Resource.Identier.Entry &
Relation.Resource.Description=references
Relation.Resource.Identier.Catalog=URI &
Relation.Resource.Identier.Entry &
Relation.Resource.Description=xxx
Relation.Resource.Identier.Catalog=URI &
Relation.Resource.Identier.Entry &
Relation.Resource.Description=has attribute
Educational.Language &
Meta-Metadata.Language
- - Educational.InteractivityType=expositive
- - Educational.InteractivityLevel=medium
- - Educational.SemanticDensity=medium
- - Educational.IntendedEndUserRole=learner
- - Educational.Context=school &
Educational.Context=training &
Educational.Context=other
82
- - Educational.Diculty=medium
- - Rights.Cost=no
- - Rights.CopyrightAndOtherRestrictions=
no
- - General.Structure=atomic
- - General.AggregationLevel=1
- - MetaMetadata.MetadataSchema=LOMv1.0
- - Technical.Requirement.OrComposite. . .
.Type=operating system
.Name=multi-os
.Type=browser
.Name=any
- - LifeCycle.Status=revised
JeromeDL resources
JeromeDL provides extract information for resources in a few forms (see Sec. 3.1.3).
rules for JeromeDL resources' attributes are presented in the following table.
LOM.
- jeromedl:Book Educational.LearningResourceType=
JeromeDLResource
General.Identier.Catalog=URI &
General.Identier.Entry &
Meta-Metadata.Identier.Catalog=URI &
Meta-Metadata.Identier.Entry
83
creator marcont:hasCreator Lifecycle.Contribute.Role=Author &
Meta-Metadata.Contribute.Role=Author &
Meta-Metadata.Contribute.Date=Date
Educational.Description &
Classication.Description
Classication.Keyword
Rights.Cost
Educational.Language &
Meta-Metadata.Language
Meta-Metadata.Contribute.Role=Supervisor &
Meta-Metadata.Contribute.Entity=Personal info.
Meta-Metadata.Contribute.Role=Consultant &
Meta-Metadata.Contribute.Entity=Personal info.
Meta-Metadata.Contribute.Role=Uploader &
Meta-Metadata.Contribute.Entity=Personal info.
- - Educational.InteractivityType=expositive
- - Educational.InteractivityLevel=medium
- - Educational.SemanticDensity=medium
- - Educational.IntendedEndUserRole=learner
84
- - Educational.Context=school &
Educational.Context=training &
Educational.Context=other
- - Educational.Diculty=medium
- - General.Structure=atomic
- - General.AggregationLevel=1
- - MetaMetadata.MetadataSchema=LOMv1.0
- - Technical.Requirement.OrComposite. . .
.Type=operating system
.Name=multi-os
.Type=browser
.Name=any
- - LifeCycle.Status
85
Chapter 5
System implementation
development of the IKHarvester system. Then, I present tools I used while writing
this Thesis. I give a brief description of software that helped me in both, writing
specied steps:
Requirements is the initial stage of the system development. This is the time of
Design Having the requirements, the designers create the architecture of the system
Testing (validation) When the system works, it is time to validate it, check
Integration After improving bugs and deciencies discovered during the testing
86
Maintenance This is the stage when the system is deployed and works in the
put into maintenance. Often, this is the time, when some uncovered errors
occur. Also, the application can be still improved; new features can be added
as well.
Logic Tier Contains the logic of the application. Basing on the input arguments
database.
87
Data Tier Handles the connection and queries to the database in order to get and
Each tier is related to the dierent aspect of the application (presentation, logic
and data). The general idea says, that the Logic Tier is the middleware used by the
a user with a web browser. The second approach introduces usage of web pages. In
the picture below (see Fig. 5.2) you can see the main web page with the menu and a
form for adding metadata for online resources to the informal knowledge repository.
A user can get metadata for informal Learning Objects (see Sec. B.1), get the
list of informal Learning Objects (see Sec. B.3), get the information and support for
facilitating usage of IKHarvester with web browsers, and learn about the system.
88
5.4 Environment and necessary tools
5.4.1 Implementation environment
Java Platform
tions using Java programming language, which is supposed to be write once, run
anywhere. It was created and managed by Sun Microsystems. Java Platform con-
sists of great many technologies. It has an execution machine which is called Java
of. I will shortly describe two of them, which I decided to use in the system I will
create.
2
Java platform programs. With reference to Sun , Java SE allows to develop and
deploy applications on desktops and servers. Java SE also includes classes that
support development of Java Web Services, and provides the foundation for Java
• Java Runtime Environment (JRE) provides Java APIs, Java Virtual Ma-
chine, and some more components required to develop applications and applets
• Java Development Kit (JDK) encapsulates JRE and useful tools for devel-
1 http://java.sun.com/j2se/
2 http://www.sun.com/
89
Java Enterprise Edition
3
Java EE provides more classes than Java SE. They are dedicated to programs
dependent servlet container, used in the ocial Reference Implementation for the
5 6
Java Servlet and Java Server Pages technologies.
Sesame's benets are: good scalability, high query performance and support for
IDE Eclipse
8
Eclipse is one of the most popular Integrated Development Environment. It is an
The platform has been designed to be plug-in-able. Its power and abilities can
3 http://java.sun.com/j2ee/
4 http://jacarta.apache.org/tomcat/
5 http://java.sun.com/products/servlets
6 http://java.sun.com/products/jsp
7 http://openrdf.org/
8 http://eclipse.org
90
Building the project Apache Ant
9
Apache Ant is an open-source build tool that was build on Java programming
language.
XML-based les.
Testing/logging log4j
I have put a lot eort in testing during the development. I have tried to create new
components and test them at once, so the risk of bugs was limited. All in all, I have
10
used log4j , the logging mechanism, with two levels of logs: errors and information.
Logs are saved to a special le. Each occurrence of an error is rich described
there is some information about the error itself, the reason why it occurred and the
12
replacement for Current Versions Systems (CVS) . It has a number of features:
atomic commits, versioning of symbolic links, native support for binary les, full
5.4.2 Documentation
Thesis environment LATEX
AT X 13
L E is a document markup language and document preparation system for
9 http://ant.apache.org
10 http://logging.apache.org/log4j/
11 http://subversion.tigris.org/
12 http://www.nongnu.org/cvs/
13 http://www.latex-project.org/
91
It allows an author to focus on the content and meaning of the document he/she
writes instead of how it looks; the visual presentation is dened by using styles. Since
one species the logical structure of the document (chapters, sections, paragraphs,
etc.), he/she can easily change the way it looks, by using another style.
facilitates writing TEX documents by highlighting the syntax, outlining the docu-
All the UML diagrams used in this document have been created in a free (Commu-
15
nity) version of JUDE (Java and UML Developers' Environment).
it can generate templates and include of Java source les, automatically generate
Figures Inkscape
16 17
Inkscape is an open source editor for creating vector graphics using W3C stan-
18
dard Scalable Vector Graphics (SVG) . It is designed to fully support XML, SVG,
92
5.5 Main problems and solution details
5.5.1 Implementation of REST
As stated before (see Sec. 4.3.1), IKHarvester is a SOA layer for Didaskon. Then,
Didaskon is a client that uses Web Services provided by my system. All available
Web Services must be specied, so that the relevant client code can be implemented.
All requests handled by IKHarvester have URIs according to the following tem-
plate:
http://notitio.us/ikh/
Below, there are denitions of all request that can be sent to IKHarvester. The
following tables dene usage from the client's point of view. When some features of
IKHarvester are invoked from a web page (provided along with the system), a query
This HTTP request is used for obtaining all the LOs that can be created from
Denitions:
type if not set, the Web Service delivers a list of LOs of all types; if set ( BlogPost,
MediaWiki, JeromeDL), only resources of that type are returned
93
Table 5.1: REST get LO list
URL http://[server]/ikh/soa/[type]
Method GET
Examples:
http://notitio.us/ikh/soa
http://notitio.us/ikh/soa/BlogPost
This HTTP request is used for retrieving from the informal knowledge repository
the manifest of the specic LO in a an XML form, compatible with LOM standard.
URL http://[server]/ikh/soa/$URI$manifest
Method GET
Denitions:
Examples:
http://notitio.us/ikh/soa/$http://dobrzanski.net/2007/03/15/pandora/$manifest
This HTTP request is used for obtaining the content of the specic LO in a an XML
Denitions:
94
Table 5.3: REST get LO content
URL http://[server]/ikh/soa/$URI$content
Method GET
Returns LO content
Examples:
http://notitio.us/ikh/soa/$http://dobrzanski.net/2007/03/15/pandora/$content
This HTTP request is used for adding and updating an informal LO to the repos-
itory. All crucial metadata, except from the actual content, for the given resource
URL http://[server]/ikh/soa/$URI$
Method PUT
Returns
Content type
Denitions:
Examples:
http://notitio.us/ikh/soa/$http://dobrzanski.net/2007/03/15/pandora/$
This HTTP request is used for removing an informal LO from the repository. It is
95
The resource is not physically removed from the repository. Instead, a triple
informing about the removal is added to the repository. This is forced because of
the synchronizing problems, when more than one LMS uses IKHarvester.
URL http://[server]/ikh/soa/$URI$
Method DELETE
Returns
Content type
Denitions:
Examples:
http://notitio.us/ikh/soa/$http://dobrzanski.net/2007/03/15/pandora/$
of RDF triples. The connection to the storage and realization of the queries are
handled in the SesameDBFace class the only class in the Didaskon DB module
that has been prepared for that reason.
The SesameDBFace class has been implemented according to the singleton pat-
tern which allows only one instance of a class. Thus, there is a private construc-
tor that can be used only from within the getInstance(. . . ) method, which is in-
voked by the logic tier of the application. It is worth noticing that the Didaskon
DB module can be used by any application. For that reason, there is the cache
List. 5.1.
96
Listing 5.1: Retrieving the connection to the data storage
/ ∗∗
∗/
private static Map<S t r i n g , S o f t R e f e r e n c e <SesameDBFace>>
private SesameDBFace ( ) {}
try {
repository = repository1 ;
graph = r e p o s i t o r y . getGraph ( ) ;
v a l u e F a c t o r y = graph . g e t V a l u e F a c t o r y ( ) ;
} catch ( AccessDeniedException e) {
/ ∗∗
∗
∗ @aut hor Jaroslaw D o b r z a n s k i <j a r o s l a w @ d o b r z a n s k i . n e t >
∗ @param repositoryId
∗ @return
∗/
public static SesameDBFace getInstance ( String repositoryId ) {
/ ∗∗
∗
∗ @aut hor Jaroslaw D o b r z a n s k i <j a r o s l a w @ d o b r z a n s k i . n e t >
∗ @param repositoryId
∗ @param login
97
∗ @param password
∗ @return
∗/
public static SesameDBFace getInstance ( String repositoryId ,
SesameDBFace dbFace = n u l l ;
synchronized (SESAME_DB_FACE_CACHE) {
S o f t R e f e r e n c e <SesameDBFace> ref =
SESAME_DB_FACE_CACHE. g e t ( r e p o s i t o r y I d ) ;
if ( r e f == n u l l || r e f . g e t ( ) == n u l l ) {
try {
LocalRepository repository =
dbFace = SesameDBFace . g e t I n s t a n c e ( r e p o s i t o r y ) ;
} catch ( UnknownRepositoryException e) {
try {
LocalRepository repository =
dbFace = SesameDBFace . g e t I n s t a n c e ( r e p o s i t o r y ) ;
} catch ( ConfigurationException e1 ) {
throw new
repositoryId +")" , e1 ) ;
} catch ( ConfigurationException e) {
repositoryId +")" , e );
SESAME_DB_FACE_CACHE. p u t ( r e p o s i t o r y I d ,
} else {
dbFace = r e f . get ( ) ;
if ( dbFace == n u l l ) {
repositoryId +")");
98
}
return dbFace ;
/ ∗∗
∗
∗ @aut hor Jaroslaw D o b r z a n s k i <j a r o s l a w @ d o b r z a n s k i . n e t >
∗ @param login
∗ @param password
∗ @return
∗/
private static LocalService getService ( String login , String password ) {
try {
return service ;
} catch ( AccessDeniedException e) {
S t r i n g . f o r m a t (ERR_ACCESS_DENIED, login ) , e );
return service ;
All queries to the data storage are handled by the SesameDBFace class from the
For that reason, it uses the performGraphQuery(String query, Object... args) method
which takes two arguments (see List. 5.2):
• args none, one or more arguments that are used in the query
99
Listing 5.2: Querying the data storage
/ ∗∗
∗
∗ @aut hor Jaroslaw D o b r z a n s k i <j a r o s l a w @ d o b r z a n s k i . n e t >
∗ @param query
∗ @param args
∗ @return
∗/
public Graph performGraphQuery ( S t r i n g query , Object . . . args ) {
try {
} catch ( Exception e) {
e . printStackTrace ( ) ;
return null ;
To make the code of IKHarvester cleaner and separate features related to the
storage issues, I have prepared the RDFQuery class that contains all the queries used
by the system (see List. 5.3).
private RDFQuery ( ) {}
/ ∗∗
∗/
public static final S t r i n g SELECT_ALL =
/ ∗∗
∗/
public static final S t r i n g SELECT_ALL_FOR_ALL =
100
/ ∗∗
∗/
public static final S t r i n g SELECT_ALL_FOR_SUBJECT =
/ ∗∗
∗/
public static final S t r i n g SELECT_ALL_FOR_PREDICATE =
/ ∗∗
∗/
public static final S t r i n g SELECT_OBJECT_FOR_SUBJECT_AND_PREDICATE =
/ ∗∗
∗/
public static final S t r i n g SELECT_SUBJECT_FOR_PREDICATE_AND_OBJECT =
/ ∗∗
∗/
public static final S t r i n g SELECT_ALL_FOR_BLANKNODESUBJECT =
/ ∗∗
∗/
public static final String
SELECT_OBJECT_FOR_BLANKNODESUBJECT_AND_PREDICATE =
/ ∗∗
∗/
public static final String
SELECT_ALL_FOR_SUBJECT_AND_PREDICATE_LIKE =
101
5.5.3 Extending IKHarvester
One of the requirements for the IKHarvester system demands allowing to extend
IKHarvester to support new types of online resources (see Sec. 4.3.4). Writing
new features should be facilitated. Therefore, I have decided that new classes for
harvesting metadata should extend the DataHarvesterImpl class and ones for pro-
viding metadata from the informal knowledge repository should extend the Dat-
aProviderImpl class. Both classes implement respectively the DataHarvester and the
DataProvider interfaces.
Since the idea for both harvesting and providing features is the same, in the
following listings (see List. 5.4 and List. 5.5) I present the mechanism only for
providing classes.
/ ∗∗
∗
∗ @aut hor Jaroslaw D o b r z a n s k i <j a r o s l a w @ d o b r z a n s k i . n e t >
∗ @param uri
∗ @return
∗ @throws IOException
∗/
public LOManifest getLOManifest ( ) ;
/ ∗∗
∗
∗ @aut hor Jaroslaw D o b r z a n s k i <j a r o s l a w @ d o b r z a n s k i . n e t >
102
∗ @throws IOException
∗/
public HarvestingResults getLOContent ( S t r i n g resourceType ,
The DataProviderImpl class implements the methods from the DataProvider inter-
face: getLOManifest() and getLOContent(String resourceType, StringBuer content).
Basing on the type of the resource, the former method creates an instance of the
The name of the subclass has a sux which is equal to the name of the resource
/ ∗∗
∗/
protected String uri = null ;
/ ∗∗
∗/
protected SesameDBFace dbFace = n u l l ;
/ ∗∗
∗ @param uri
∗/
public DataProviderImpl ( S t r i n g uri1 ) {
uri = uri1 ;
103
/∗ ( non−J a v a d o c )
∗ D a t a P r o v i d e r#g e t L O M a n i f e s t ( j a v a . l a n g . S t r i n g )
∗/
public LOManifest getLOManifest ( ) {
StatementIterator iter =
dbFace . g e t G r a p h S t a t e m e n t s (
RDFQuery . SELECT_ALL_FOR_SUBJECT_AND_PREDICATE_LIKE,
uri , NS . NOTITIOUS . r e s o u r c e T y p e ) ;
if ( i t e r == n u l l || ! i t e r . hasNext ( ) ) {
return null ;
// t h e r e is maximum o n e entry
s u b s t r i n g ( NS . NOTITIOUS . r e s o u r c e T y p e . l e n g t h ( ) ) ;
if ( ! U t i l . i s S t r i n g S e t ( resType ) ) {
return null ;
try {
Class p r o v i d e r C l a s s = C l a s s . forName (
DataProviderImpl . c l a s s . getPackage ( ) .
m a n i f e s t = p r o v i d e r . getLOManifest ( ) ;
} catch ( SecurityException e) {
} catch ( IllegalArgumentException e) {
} catch ( ClassNotFoundException e) {
104
} catch ( NoSuchMethodException e) {
} catch ( InstantiationException e) {
l o g g e r . e r r o r (" I n s t a n t i a t i o n E x c e p t i o n " , e );
} catch ( IllegalAccessException e) {
l o g g e r . e r r o r (" I l l e g a l A c c e s s E x c e p t i o n " , e );
} catch ( InvocationTargetException e) {
return manifest ;
/∗ ( non−J a v a d o c )
∗ @see o r g . c o r r i b . i k h a r v e s t e r . p r o v i d e r . D a t a P r o v i d e r#g e t L O C o n t e n t (
∗/
public HarvestingResults getLOContent ( S t r i n g resourceType ,
/ ∗∗
∗
∗ @aut hor Jaroslaw D o b r z a n s k i <j a r o s l a w @ d o b r z a n s k i . n e t >
∗ @return
∗/
public static L i s t <LOJBean> g e t L O L i s t ( ) {
o b j e c t s . a d d A l l ( g e t L O L i s t ( C o n s t a n t .RESOURCE_TYPE_BLOG ) ) ;
o b j e c t s . a d d A l l ( g e t L O L i s t ( C o n s t a n t .RESOURCE_TYPE_DL ) ) ;
o b j e c t s . a d d A l l ( g e t L O L i s t ( C o n s t a n t . RESOURCE_TYPE_WIKI ) ) ;
return objects ;
105
/ ∗∗
∗
∗ @param type
∗ @return
∗/
public static L i s t <LOJBean> g e t L O L i s t ( S t r i n g type ) {
C o n s t a n t . REPOSITORY_ID ) ;
RDFQuery . SELECT_SUBJECT_FOR_PREDICATE_AND_OBJECT,
if ( i t e r == n u l l ) {
return los ;
while ( i t e r . hasNext ( ) ) {
StatementIterator it = dbFace . g e t G r a p h S t a t e m e n t s (
RDFQuery . SELECT_OBJECT_FOR_SUBJECT_AND_PREDICATE,
if ( i t == n u l l ) {
continue ;
if ( i t . hasNext ( ) ) {
removed = t r u e ;
it . close ();
return los ;
106
}
with a client application or by putting their URL in the form on the main page of
the system. The latter method might be bothersome since every time a user must
go to the above mentioned web page and return to the nal one after the adding.
To reach out users needs, I have created an add-on for Firefox, one of the most
popular web browser. In general, an add-on adds some functionality for a piece of
software. The one I have created works with the implementation of IKHarvester
19
deployed on the notitio.us project (see Sec. 6.1.2); it adds a button with post
Fig. 5.3).
19 http://notitio.us/
107
Any time a user visits such page, he/she can click the above mentioned button.
He/she is redirected to one of IKHarvester web pages, where the initial page can be
tagged and saved to the informal knowledge repository. All the information in that
repository is shared.
108
Chapter 6
Conclusions
found in blogs, fora, digital libraries, wikis, etc. The amount of such data is growing
In this thesis, I have included the results of my research in the eld of the Se-
mantic Web and eLearning. I have presented those two approaches and dened the
informal learning from a few types of SSIS and delivering it to eLearning 2.0 frame-
works. Then, I have designed the architecture of such system (IKHarvester) and
developed it. Finally, I have successfully deployed IKHarvester in the real environ-
ment.
6.1 Achievements
6.1.1 Publications
This thesis is dedicated to the issue of collecting informal knowledge from Social
Semantic Information Sources. The idea of how to capture data from online resources
is quite innovative and there is a lot of eort put into research in that eld.
1
The Semantic Infrastructure (SemInf ) lab, the core part of the Corrib Cluster
2
in the Digital Enterprise Research Institute , whose member I am, is also interested
1 http://corrib.org/
2 http://deri.org
109
in this area. During last months, we have created a few publication.
Faculty of Engineering Research Day 2007, Galway, Ireland, April 16, 2007
6.1.2 IKHarvester
Although current version of IKHarvester is a prototype, it works well and collects a
Benets
To recap, there are few solutions for capturing managing semantic annota-
ticWeb.com, Piggy Bank, and Zotero (see Sec. 4.1.1). Although their goal is similar,
they achieve it in dierent ways. Table 6.1 explicitly shows the dierence between
the above mentioned solutions, indicating the level of support for each of the feature.
110
Table 6.1: Comparison of tools for collecting informal data
buttons: FF,
Integration with buttons: FF, FF an add-on FF an add-on
Opera, IE;
browsers Opera, IE itself itself
add-on for FF
Support for
some none weak none
Wikipedia
Support for
full some full weak
JeromeDL
Bank
Accessible with
yes yes no no
Web Services
partially
Allows data
yes yes sharing with no
sharing
Semantic Bank
no depen- no depen-
Support for new yes writ-
yes writing dency on the dency on the
document types ing new screen
new blades authors of the authors of the
(extensibility) scrapers
tool tool
111
Integration with web browsers is crucial for such systems. The more web browser
the system supports, the better; such a tool should not demand using a specic
browser. Since Piggy Bank and Zotero are Firefox add-on, they are perfectly inte-
Internet Explorer and Opera by providing special buttons for capturing data. More-
over, some features of IKHarvester can be invoked by using a special add-ons for
All compared tools, except from Zotero, are able to collect sucient amount of
metadata for online resources available on web pages, by reading RDF documents
that those pages link to. By sucient, we mean more information than the URL
or the title of the resource. For instance, there should be some information about
it collects metadata also from non-semantic web pages, like Wikipedia which is a
To make much more use of metadata for learning purposes, it should be shared
and made available for all. For that reason, it is necessary to access it with Web
Services as it improves its accessibility and reusability. Also, tagging helps managing
acquits itself well. All shared data can be retrieved, saved, and tagged by calling
online resources as learning material (informal Learning Objects), and uses captured
accordance with LOM standard. This rich information is used by eLearning LMSs,
Success stories
There can be found a lot of projects it can be used in. At the moment it is employed
112
Didaskon
3
IKHarvester has been designed as a SOA layer for Didaskon , a system designed
knowledge [53].
Basing on some preconditions, Didaskon creates a learning path which best ts a
specic learner. To achieve that, the system uses initial information (preconditions)
like a student's needs, skills, learning history etc., anticipated resulting skills and
4
Initially, IKHarvester was supposed to work with Didakon , an eLearning 2.0
framework (see Sec. 3.3.2). However, during the development, I have found another
notitio.us
5
notitio.us is a service for collaborative knowledge aggregation and sharing; it em-
ploys IKHarvester for retrieving RDF information about Web resources bookmarked
by the users. Therefore, it is capable of indexing rich metadata, coming from various
keeps rich, semantically interconnected metadata shared by the users using Social
The resources not only can be shared with bookmarking interface (SSCF),
but also, based on the rich metadata, they can be searched and browsed using
6
TagsTreeMaps , a tags browser based on treemaps rendering algorithm, and Multi-
in LOM standard, which turns notitio.us into a valuable source of learning objects
3 http://didaskon.corrib.org/
4 http://didaskon.corrib.org/
5 http://notitio.us/
6 TagsTreeMaps: http://sf.net/projects/tagstreemaps/
113
Figure 6.1: IKHarvester in the notitio.us service
To learn more about IKHarvester and notitio.us, please visit its home page:
http://notitio.us/ikh/.
The system was designed in a manner that allows extending it so that it works
with other sources of informal knowledge (see Fig. 4.6). In future, it should support
more types of online resources, among others: Bricks (another digital library), blogs
7
hosted on Blogger , and other types of wiki engines.
7 http://www.blogger.com/
114
Bibliography
[3] L. Aroyo and D. Dicheva. The new challenges for e-learning: The educational
[4] U. Bojars, J. Breslin, and A. Passant. Sioc browser - towards a richer blog
[5] J. Brase, M. Painter, and W. Nejdl. Completing LOM - how additional axioms
increase the utility of learning object metadata. In ICALT, page 493. IEEE
[6] M. Cygan. Ubiquitous search service component gateway for heterogeneous l2l
128
[9] DublinCore Initiative, http://dublincore.org/documents/dces/. Dublin Core
[12] P. Graham. Web 2.0. Online; accessed December 18, 2006; http://www.
paulgraham.com/web20.html.
System. In TEHOSS'2005.
posium on Wikis, 2006, Odense, Denmark, August 21-23, 2006, pages 137138.
ACM, 2006.
[17] J. Hendler and O. Lassila. The semantic web. Scientic American Magazine,
May 2001.
http://ltsc.ieee.org/wg12/les/LOM_1484_12_1_v1_Final_Draft.pdf, 2002.
129
[20] A. H. John Breslin, Stefan Decker. Sioc: an approach to connect web-based
[21] S. D. John Breslin. Semantic web 2.0: Creating social semantic information
spaces.
[22] D. R. Karger and D. Quan. What would it mean to blog on the semantic web?
[23] T. Karrer. elearning 2.0: Informal learning, communities, bottom-up vs. top-
[24] R. Khare. Microformats: The next (small) thing on the semantic web? IEEE
[27] S. R. Kruk. E-learning on semantic web 2.0, 2006. Online; accessed November 5,
2006; http://www.sebastiankruk.com/storage/presentation/elearning_
on_sw20/img0.html.
search and browsing for digital libraries. In Mobile Data Management, 2006.
tic Web - ASWC 2006, First Asian Semantic Web Conference, Beijing, China,
130
September 3-7, 2006, Proceedings, volume 4185 of Lecture Notes in Computer
ISWC, 2006.
[32] A. D. Learning. Scorm homepage. Online; accessed May 1st, 2007; http:
//www.adlnet.gov/scorm/.
[36] D. G. W. Mission. Beyond elearning: practical insights from the usa. Technical
blogosphere.
2002.
131
[40] M. of New Media. E-learning - m/cyclopedia of new media, 2006. Online; ac-
[41] S. O'Hear. E-learning 2.0 - how web technologies are shaping education. On-
[42] S. O'Hear. Seconds out, round two. The Guardian, 2005. Online; accessed
2006; http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/
what-is-web-20.html.
[45] P. Prescod. Rest and the real world, February 2002. Online; accessed April 7,
2007; http://webservices.xml.com/lpt/a/ws/2002/02/20/rest.html.
Feb. 07 2007.
September 2005.
database with the semantic web technologies. In Proceedings of the 16th In-
132
ternational Conference on Database and Expert Systems Applications. Copen-
2003.
[54] K. M. Uldis Bojars, John Breslin. Using semantics to enhance the blogging
Semantic Web Conference, ESWC 2006, Budva, Montenegro, June 11-14, 2006,
[56] W3C. Owl web ontology language guide. Online; accessed December 16, 2006;
http://www.w3.org/TR/owl-guide/.
[57] W3C. Owl web ontology language overview. Online; accessed March 21, 2007;
http://www.w3.org/TR/owl-features/.
[58] W3C. Primer: Getting into rdf & semantic web using n3. Online; accessed
[59] W3C. Rdf vocabulary description language 1.0: Rdf schema. Online; accessed
133
[60] Wikimedia. Learning object metadata - meta. Online; accessed April 12, 2007,
http://meta.wikimedia.org/wiki/Learning_object_metadata.
[61] Wikipedia. E-learning - wikipedia, the free encyclopedia, 2006. Online; accessed
[62] Wikipedia. Semantic web - wikipedia, the free encyclopedia, 2006. Online; ac-
[63] Wikipedia. World wide web - wikipedia, the free encyclopedia, 2006. Online;
[65] D. Zambonini. Is web 2.0 killing the semantic web? O'Reilly XML Blog,
134
List of Figures
135
List of Listings
136
List of Tables
137
Appendix A
Installation guide
The web container can be downloaded from its home page: http://tomcat.apache.
org/. Apache Tomcat should be installed according to the instructions available on
Let us assume, TOMCAT_DIR/ is the Tomcat installation directory; this name will
be used further in this chapter.
A.2 Sesame
Sesame, RDF storage, plays the role of the informal knowledge repository. IKHar-
Sesame webapp must be put into TOMCAT_DIR/webapps/, and all jars moved from
TOMCAT_DIR/webapps/sesame/WEB-INF/lib/ to TOMCAT_DIR/common/lib/.
Having installed Sesame, it should be congured. For that reason, put the fol-
In the listing, STORAGE_FILENAME is the path to the le where RDF data will be
138
stored.
<s a i l s t a c k >
</ s a i l >
</ s a i l s t a c k >
</ r e p o s i t o r y >
A.3 IKHarvester
IKHarvester can be run in two ways, either by dening the listener in Apache Tom-
project.
A.3.2 Conguration
After downloading the application, put all jar les from
IKHARVESTER_DIR/dist/TOMCAT_DIR/common/lib/ to TOMCAT_DIR/common/lib/
directory. Also, commons-fileupload-1.1.jar le (copied from Sesame 1.2.6
distribution) must be deleted, because along with IKHarvester les you have
139
Running the application
There are two ways of running IKHarvester. You can do it either by dening a
of Apache Tomcat, the web container sees changes to the source les every time they
are compiled. Consequently, there is no need to redeploying war le and restarting
the container.
http://localhost:8080/ikh
Deploying WAR
After changes to the system source les, one musts run ant script
The onerousness of this approach lies in the fact that every time the developer
makes a change, he must create new ikharvester.war le, deploy it, and restart
the web container. That is why, I suggest using the former approach.
140
Appendix B
Output examples
knowledge repository (see Tab. 4.2.2 for details for that functional requirement) in
In the List. B.1 there is presented the description of a LO created out of informa-
<lom>
<g e n e r a l >
<i d e n t i f i e r >
<e n t r y >
h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 2 3 / a j a x −a c t i v i t y −i n d i c a t o r /
</ e n t r y >
</ i d e n t i f i e r >
<t i t l e >
</ l a n g s t r i n g >
</ t i t l e >
141
<l a n g u a g e >en </ l a n g u a g e >
<d e s c r i p t i o n >
</ l a n g s t r i n g >
</ d e s c r i p t i o n >
<keyword>
</ l a n g s t r i n g >
</ l a n g s t r i n g >
h t t p : / / d o b r z a n s k i . n e t / c a t e g o r y / web20 /
</ l a n g s t r i n g >
</keyword>
<s t r u c t u r e >
</ s t r u c t u r e >
<a g g r e g a t i o n l e v e l >
</ a g g r e g a t i o n l e v e l >
</ g e n e r a l >
<l i f e c y c l e >
<v e r s i o n >
2007 − 04 − 23T22 : 4 3 : 1 5 Z
</ l a n g s t r i n g >
</ v e r s i o n >
<c o n t r i b u t e >
<r o l e >
142
</ r o l e >
<d a t e >
<d e s c r i p t i o n >
</ d e s c r i p t i o n >
</d a t e >
</ c o n t r i b u t e >
<s t a t u s >
</ s t a t u s >
</ l i f e c y c l e >
<metametadata>
<i d e n t i f i e r >
<e n t r y >
h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 2 3 / a j a x −a c t i v i t y −i n d i c a t o r /
</ e n t r y >
</ i d e n t i f i e r >
<c o n t r i b u t e >
<r o l e >
</ r o l e >
<d a t e >
<d e s c r i p t i o n >
</ d e s c r i p t i o n >
</d a t e >
</ c o n t r i b u t e >
</metametadata>
143
<t e c h n i c a l >
<l o c a t i o n >
h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 2 3 / a j a x −a c t i v i t y −i n d i c a t o r /
</ l o c a t i o n >
<r e q u i r e m e n t >
<o r c o m p o s i t e >
<t y p e >
</t y p e >
<name>
</name>
</ o r c o m p o s i t e >
</ r e q u i r e m e n t >
<r e q u i r e m e n t >
<o r c o m p o s i t e >
<t y p e >
</t y p e >
<name>
</name>
</ o r c o m p o s i t e >
</ r e q u i r e m e n t >
</ t e c h n i c a l >
<e d u c a t i o n a l >
<l e a r n i n g r e s o u r c e t y p e >
</ l e a r n i n g r e s o u r c e t y p e >
144
<d e s c r i p t i o n >
</ l a n g s t r i n g >
</ d e s c r i p t i o n >
<i n t e r a c t i v i t y t y p e >
</ i n t e r a c t i v i t y t y p e >
<i n t e r a c t i v i t y l e v e l >
</ i n t e r a c t i v i t y l e v e l >
<s e m a n t i c d e n s i t y >
</ s e m a n t i c d e n s i t y >
<i n t e n d e d e n d u s e r r o l e >
</ i n t e n d e d e n d u s e r r o l e >
<c o n t e x t >
</ c o n t e x t >
<c o n t e x t >
</ c o n t e x t >
<c o n t e x t >
</ c o n t e x t >
<c o n t e x t >
145
</ c o n t e x t >
<d i f f i c u l t y >
</ d i f f i c u l t y >
</ e d u c a t i o n a l >
<r i g h t s >
<c o s t >
</ c o s t >
</ r i g h t s >
<r e l a t i o n >
<k i n d >
</k i n d >
<r e s o u r c e >
<i d e n t i f i e r >
<e n t r y >
</ e n t r y >
</ i d e n t i f i e r >
<d e s c r i p t i o n >
</ d e s c r i p t i o n >
</ r e s o u r c e >
</ r e l a t i o n >
<r e l a t i o n >
<k i n d >
</k i n d >
<r e s o u r c e >
146
<i d e n t i f i e r >
</ i d e n t i f i e r >
<d e s c r i p t i o n >
</ d e s c r i p t i o n >
</ r e s o u r c e >
</ r e l a t i o n >
<r e l a t i o n >
<k i n d >
</k i n d >
<r e s o u r c e >
<i d e n t i f i e r >
</ i d e n t i f i e r >
<d e s c r i p t i o n >
</ d e s c r i p t i o n >
</ r e s o u r c e >
</ r e l a t i o n >
</lom>
147
B.2 LO content example
Apart from the description of a Learning Objcect in LOM (see List. B.1, IKHarvester
can also provide the content of such LO. The content is supposed to be used in the
In the List. B.2, there is presented the content of a LO created out of information
<a h r e f =" h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 2 2 /
![CDATA[ Ajax . R e s p o n d e r s . r e g i s t e r ( {
148
]]></ s c r i p t ></ c o d e ></p r e ><p>Then ,
</ c o n t e n t >
</LO>
149
B.3 List of LOs example
IKHarvester can deliver to a LMS a list of informal LOs it stores (see Tab. 4.2.2 for
<LOList>
</LOList>
150