Social Semantic Information Sources For Elearning

Gda«sk University of Technology
Faculty of Electronics, Telecommunications

and Informatics
National University of Ireland, Galway

Digital Enterprise Research Institute
Department: Department of Computer Systems Architecture
(Katedra) (Katedra Architektury Systemów Komputerowych)
Student's name: Jarosªaw Dobrza«ski

(Imi¦ i nazwisko)
ID number: 93635/ETI
(Nr albumu)
Type of studies: Master's
(Rodzaj studiów) (Dzienne magisterskie)
Specialty: Informatics, Distributed Applications and Internet Systems
(Kierunek studiów) (Informatyka, Aplikacje Rozproszone i Systemy Internetowe)
Master's Thesis
Praca magisterska
Title: Social Semantic Information Sources for eLearning
(Tytuª pracy)
Supervisor: prof. dr hab. in».Henryk Krawczyk, prof. zw. PG
(Kieruj¡cy prac¡)
Consultant: mgr in». Sebastian Ryszard Kruk
(Konsultatnt)
Thesis domain: Make a thorough analysis of Social Semantic Information Sources in a context
(Zakres pracy) of using them in eLearning. Identify best tting ontologies used for their de-
scription. Dene a common object model for them. Develop the framework
that supplies Didaskon with information described with this model.
Gda«sk, 2007
Contents
1 Introduction 5
1.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Goal of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Related work 9
2.1 eLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 History of eLearning . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Learning Object . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Defects of eLearning . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 The current Web . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Semantic Web and eLearning . . . . . . . . . . . . . . . . . . 20
2.3 Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 AJAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Democracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.3 Social network . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.5 Mashups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Semantic Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 Metadata development . . . . . . . . . . . . . . . . . . . . . . 25
3 Social Semantic Information Sources and eLearning 2.0 29

3.1 Examples of Social Semantic Information Sources . . . . . . . . . . . 29
1
3.1.1 Semantic Blogs . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.2 Semantic Wikis . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.3 Social Semantic Digital Library . . . . . . . . . . . . . . . . . 36
3.2 Model of Social Semantic Information Sources . . . . . . . . . . . . . 40
3.2.1 SIOC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 eLearning 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Is there a place for semantics? . . . . . . . . . . . . . . . . . . 43
3.3.2 Didaskon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4 Informal Knowledge Harvester 47

4.1 Capturing informal learning . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.1 Existing tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 System Requirement Specication . . . . . . . . . . . . . . . . . . . . 50
4.2.1 System scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.2 System requirements . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.3 System Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 System design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3.1 Service-Oriented Architecture . . . . . . . . . . . . . . . . . . 69
4.3.2 System components . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3.4 Extending IKHarvester . . . . . . . . . . . . . . . . . . . . . . 74
4.3.5 Attribute mapping rules . . . . . . . . . . . . . . . . . . . . . 77
5 System implementation 86
5.1 Implementation methodology . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Three-tier architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3 IKHarvester main page . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Environment and necessary tools . . . . . . . . . . . . . . . . . . . . 89
5.4.1 Implementation environment . . . . . . . . . . . . . . . . . . . 89
5.4.2 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5 Main problems and solution details . . . . . . . . . . . . . . . . . . . 93
2
5.5.1 Implementation of REST . . . . . . . . . . . . . . . . . . . . . 93
5.5.2 Invoking the data tier features . . . . . . . . . . . . . . . . . . 96
5.5.3 Extending IKHarvester . . . . . . . . . . . . . . . . . . . . . . 102
5.5.4 Adding data to the informal knowledge repository . . . . . . . 107
6 Conclusions 109
6.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.1.1 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.1.2 IKHarvester . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7 Streszczenie pracy w j¦zyku polskim 115

7.1 Wst¦p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1.1 Denicja problemu . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1.2 Cele . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2 Podstawy teoretyczne . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2.1 eLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2.2 Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.2.3 Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.2.4 Semantic Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.3 Social Semantic Information Sources i eLeraning 2.0 . . . . . . . . . . 120
7.3.1 Przykªady Social Semantic Information Sources . . . . . . . . 120
7.3.2 Model Social Semantic Information Sources . . . . . . . . . . 122
7.3.3 eLearning 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.4 Informal Knowledge Harvester . . . . . . . . . . . . . . . . . . . . . . 123
7.4.1 Capturing informal knowledge . . . . . . . . . . . . . . . . . . 123
7.4.2 Analiza wymaga« . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.4.3 Projekt systemu . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.5 Implementacja . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.5.1 Metodologia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.5.2 rodowisko i niezb¦dne narz¦dzia . . . . . . . . . . . . . . . . 126
7.6 Uwagi ko«cowe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3
7.6.1 Publikacje . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.6.2 Perspektywy na przyszªo±¢ . . . . . . . . . . . . . . . . . . . . 127
Bibliography 128
List of Figures 135
List of Listings 136
List of Tables 137
A Installation guide 138

A.1 Apache Tomcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.2 Sesame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.3 IKHarvester . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
A.3.1 Downloading the source code . . . . . . . . . . . . . . . . . . 139
A.3.2 Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
B Output examples 141

B.1 LOM example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
B.2 LO content example . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
B.3 List of LOs example . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4
Chapter 1
Introduction
First Chapter is the introduction to this Master's Thesis. Here, I formulate and
describe the main problems I should face while developing the nal system. I form
the goals to achieve. Also, I describe the methodology of the developed system.
Finally, I introduce the outline of this paper.
1.1 Problem description

Learning is not attained by chance, it must be sought for with ardor and
attended to with diligence.
Abigail Adams, 1780
People have been learning for ages; to eat one had to hunt, to feel relaxed
sleep, to stay alive avoid danger. One was obliged to learn to make his/her life
easier or, even more, to survive!
It seems nothing has changed. However, the learning process is more organized
now than it used to be. In general, learning can be divided into formal and informal.
Formal learning is all we remember from the school or the university; it is an
old traditional approach. Courses are rigid, made once and for all. Students are
pushed to go through a course from beginning to the end without the possibility of
changing its content.
Informal learning, also known as self-directed learning, is more natural, spon-
taneous and gives a user more exibility in deciding when, where and what to learn.
5
We often learn that way unconsciously by chatting, video conferences, observing oth-
ers, reading blogs or wikis. It is denitely cheaper and, what can be strange, more
eective. In fact, most learning occurs as such unstructured processes and is not
orchestrated or directed by learning specialists. According to surveys taken in USA,
75% of organizational learning is informal [36]. Informal learning relates with col-
laborative learning, which supports communication between learners, communities
of learners and other forms of shared knowledge creation and sharing.
The 'e' in eLearning stands for experience.
Elliott Masie, Masie Center
eLearning is a term used for describing computer-enhanced learning process [61];
it is naturally suited for distance, exible learning but also can be used along with the
traditional, face-to-face approach. eLearning is no longer associated with materials
delivered on CD-ROMs and sent across the country [31]. Nowadays, tools like web-
based teaching materials and hypermedia in general (web pages, discussion boards,
simulations and games) are commonly used. eLearning is so popular due to the
expansion of the Internet. There are a lot of online services that provide courses for
1 2 3
free or by paying, e.g. Nuvvo , Berlin High School eLearning Online , eTech Ohio .
After all, there are a considerable number of Web 2.0 (see Sec. 2.3) services
like blogs, wikis, fora, digital libraries, online chats or video conferences. They allow
users to collaborate with peers and share their opinions. They are sources of informal
learning since a lot of relevant information is passed there. However, information is
often unordered and it can be dicult to nd anything.
By applying the Semantic Web (see Sec. 7.2.2) to Web 2.0 services, we make their
content also machine readable. Thus, computers can assist and guide users so that
they are not lost in the sea of information [61]. The assumptions of the Semantic
Web are fullled by introducing semantic annotations of online resources. This way,
blogs, wikis, and digital libraries become Social Semantic Information Sources (see
Chapter 7.3.1).
1 http://www.nuvvo.com/
2 http://moodle.berlinwall.org/
3 http://www.etech.ohio.gov
6
In this thesis, I focus on informal learning based on Social Semantic Information
Sources (SSIS). I consider semantic blogs, semantic wikis and social semantic digital
libraries as unfailing source of informal knowledge. Both Web 2.0 and the Semantic
Web, combined together, allow us to create new, better solutions that go beyond
current eLearning assumptions. They form a new learning standard, eLearning 2.0,
which aims at giving the ability to leverage the community as a part of the larger
eLearning picture [23].
4
Didaskon is a system designed according to eLearning 2.0 assumptions. This
5
is a project developed in the Digital Enterprise Research Institute at the National
6
University of Ireland, Galway ; it was initiated as a working group project in co-
7
operation with Gda«sk University of Technology , Faculty of Electronics, Telecom-
8
munications and Informatics . Didaskon delivers a framework for composing on-
demand curriculum from existing learning objects provided by eLearning services
(formal knowledge). Besides, it derives from Social Semantic Information Sources
sources of informal knowledge. Basing on some preconditions, Didaskon creates
a learning path which best ts a specic learner. To achieve that, the system uses
initial information (preconditions) like a student's needs, skills, learning history etc.,
anticipated resulting skills and knowledge (goals), and technical details of the clients
platform.
1.2 Goal of this thesis

The main goal of this thesis is to develop IKHarvester, an extension for the Didaskon
system. IKHarvester will be a Service Oriented Architecture (SOA) layer. Its goal
is to capture informal learning from Social Semantic Information Sources and store
metadata for this information. Stored data should be delivered to Didaskon in the
form of informal learning objects that support formal ones during learning path
composition.
4 http://didaskon.corrib.org/
5 http://deri.org
6 http://www.nuigalway.ie/
7 http://www.pg.gda.pl/
8 http://www.eti.pg.gda.pl/
7
Social Semantic Information Sources provide heterogeneous data. Hence, design
and development of IKHarvester should be supported by a thorough analysis of
them. The extension should provide data in a consistent, common object model.
Thus, Didaskon is able to perform more eective reasoning during course composi-
tion.
Figure 1.1: Capturing informal learning with IKHarvester
1.3 Outline
This thesis is divided into six chapters. In Chapter 2, I present related works; there
is an introduction to the Semantic Web and Web 2.0. In Chapter 3, I introduce
Social Semantic Information Sources and on new eLearning solutions. Chapter 4
is a summary of the designing stage; here I dene the system requirements, use
cases, and architecture. In chapter 5, I describe the implementation proces: tools
I have used during the system development, software helpful during writing the
documentation. Finally, Chapter 6 is lled with some conclusions related to my
work.
8
Chapter 2
Related work
In this Chapter, I describe the state of the art in eLearning, Web 2.0, and the Seman-
tic Web. At rst, I give a denition of eLearning and present how it has changed in
years. I dene a Learning Object that is anything one can acquire, manage and use.
Then, I characterize the Semantic Web, a newer, better Web, where the content of
resources is used by people and software agents. Afterwords, I introduce Web 2.0, a
Web dedicated for communities whose members share information and collaborate.
Finally, there is a description of the Semantic Web 2.0, which is a combination of
Web 2.0 and the Semantic Web.
2.1 eLearning
eLearning (Electronic Learning) [40] is the delivery of educational content through
any electronic media, including the Internet, intranets, extranets, satellite broadcast,
audio and video tapes, interactive TV, CD-ROMs, interactive CDs and computer-
based training. It is expected to squeeze out the old-fashioned learning. In the old
approach, a student is passive, pushed to learn. He/she is obliged to obey some rules
dening when and where the classes take place and what is their actual content [33].
Thus, the learning process is constrained and limited.
Unlike that, a learner should be given (to some extent) a free hand with regard
to selecting the course schedule. One should be allowed to learn just-in-time, on-
demand. Moreover, he/she should have inuence on the contents of the classes.
9
Learning should be customized, initiated by user proles and business demands.
This is actually what eLearning is aimed at.
There are two communication technologies used for eLearning: synchronous and
asynchronous. The rst expects students to gather face-to-face or use chats, video-
conferences etc. The latter approach is characterized by using blogs, wikis or dis-
cussion boards as tools for sharing opinions or gained experience. All in all, they
both support informal learning.
2.1.1 History of eLearning

The beginning of eLearning goes back to the late 1950s when tools such as cal-
culators, VCRs, radio and bulletin board systems were used. All these tools have
contributed to ideas concerning the uses of the eLearning systems [40].
In the mid 1980s eLearning started to develop rapidly. It was the time when
the Multimedia Era began; Windows 3.1, PowerPoint, Macintosh and CD-ROMs
became popular and common all over the world [25]. Computers were supposed
to make training more transportable and visually engaging; learning became the
Computer Based Training.
In the mid 1990s, the Internet was popular enough; hence, training providers
tried to incorporate it into tuitional process. They found emails, web browsers, web
pages, media players, streamed audio and video very helpful. A great number of
companies enriched courses with graphics and web-based training and made them
accessible in the Internet. eLearning entered student lives for good.
2.1.2 Learning Object

eLearning is evolving rapidly. Consequently, there is more and more information
that can be learned. A single piece of information is called a Learning Object.
In general, a Learning Object is something you can acquire, manage and use.
LOs [19] are reusable, modular, exible, portable and compatible. Ecient manag-
ing of learning material is crucial to make a Learning Management System (LMS)
work properly. The problem is, how to organize metadata so that it can be ex-
changed between dierent LOs.
10
SCORM
1
SCORM (Shareable Content Object Reference Model) is a Web oriented data model
for content aggregation. This is an XML-based framework used to dene and access
information about LOs so they can be easily shared among dierent LMSs. SCORM
focuses on the structure, runtime environment for LOs and description of learning
process [53].
The part of SCORM related to this thesis is SCORM CAM (Content Aggregation
Model) which denes how to create and manage LOs. According to SCORM CAM,
the content of a learning object can be diverse: plain text, HTML code, short movies
or even more complicated interactive course. Also, SCORM CAM includes all the
specications of the IEEE LOM data model.
LOM
LOM (Learning Object Metadata) is a model for representing information about
learning objects and electronic resources in general; it is a standard underlying
SCORM 2004 [5].
LOM denes the way to build metadata for LOs. There are nine categories of
this information [18] each of which focus on dierent aspects (see also 2.1):
• General general information about the LO as a whole
• Lifecycle features related to the history and current state of the LO and
those who have aected it during its evolution
• MetaMetadata information about the metadata instance itself
• Technical groups the technical requirements and technical characteristics of
the LO
• Educational educational and pedagogic characteristics of the LO
• Rights intellectual property rights and condition of use the LO
1 http://www.adlnet.gov/scorm/
11
Figure 2.1: LOM structure (from [60])
12
• Relation group of features dening the relationship between the LO and
other related LOs
• Annotation comments on the educational use of the LO and information on
the author of the comment and time when it was written
• Classication describes the LO in relation to a particular classication system
2.1.3 Defects of eLearning

eLearning was, and still is, a great importance in the education process. Its main ad-
vantage is the convenience and exibility a learner is given; one learns when he/she
wants to or has time. Another positive aspect is communication between learn-
ers, which allows them to share opinions on learning material. eLearning provides
greater adaptability to a learner's needs and more varieties in learning experience.
Nevertheless, eLearning suers from a considerable number of limits and disadvan-
tages.
First of all, eLearning is course-oriented, not user-oriented. There is one course
prepared for all. Usually, courses are not personalized [27], they are tailored for
a generic student on one of the generic levels of skills or knowledge. The main
assumption is a desire to pass the course. Learning services usually do not take into
account specic user's conditions, like wishing to broaden knowledge in wide range
of domains at the same time [52]. Thus, a student is obliged to attend dierent
courses at the same time; some information can be repeatedly used.
Secondly, current learning services treat students as single entities; they do not
assume community involvement. A learner is merely supposed to go through the
course alone, without response from other students who also attend it. Students are
deprived of possibility to exchange their observations and comments.
Finally, existing Learning Management Systems use only traditional, formal
methods they serve only what is supplied by the provider. Currently, in the
Internet, there is a lot relevant information which could support learning process.
Current LMSs are not capable of understanding the content of many web pages,
which are per se informal sources of knowledge.
13
eLearning needs management support in order to dene a vision and plan for
learning and to integrate learning into daily work. However, current Web based
solutions do not meet the requirements mentioned above; they bring the problem of
information overload, lack of accurate information or content that is not machine-
understandable. Only the course creators and students can understand the content
of the course.
2.2 The Semantic Web

In this Section I point out the limitations of the current Web. I introduce the
Semantic Web and focus on its core assumptions and solutions. Finally. I present
the Semantic Web 2.0, which links the Semantic Web platform into existing Web
2.0 features.
2.2.1 The current Web

The World Wide Web (Web) is an invention of Sir Tim Berners-Lee; it is a system
of interlinked, hypertext documents (called web pages or website) that run over the
Internet. Web pages can contain text and multimedia such as images, movies, music,
etc. They are navigated by using hyperlinks and viewed with a Web browser [63].
Web data is accessible and exchangable through HTTP protocol.
Web pages are written in HTML language. Each page has its own URL (Uniform
Resource Locator), which is a synonym of URI (Uniform Resource Identier). A
website can contain hyperlinks that connect it with other sites.
On the one hand, there is an abundance of valuable information. On the other
hand, the information is not machine readable. Take the following sentence as
an example: Bob Marley was a Jamaican reggae musician. This sentence is an
established fact, understood by a human. However, it does not bring any particulars
for a machine. One can ask: why should machines understand web page contents,
when these are people who look for it? Nothing more confusing.
14
Simple scenario
Imagine Adam, a young man with a broad music taste. He had listened rock and
ska music for ages. Once he heard a reggae song by Bob Marley on his favorite
Internet radio. He really liked the song, and wanted to learn something about
reggae music and Bob Marley.
The perfect situation assumes he nds a reggae music fans community where
a great many information and useful links could be found. Even the previously
mentioned sentence about Bob Marley could, for a start, satisfy Adam as it brings
some knowledge. But, this sentence can be hidden in the mids of the accumulation
of web pages. Computers' task is to nd it.
But, again, how can the computer nd anything when it does not comprehend
it? It must be somehow described, so that a machine can distinguish one piece of
information from another. The above-mentioned scenario portrays the importance
of appropriate resources denition. Without it, computers are not able to help
people derive the boon of the Internet.
2.2.2 The Semantic Web
The Semantic Web will bring structure to the meaningful content of Web
pages, creating an environment where software agents roaming from page
to page can readily carry out sophisticated tasks for users.
sir Tim Berners-Lee
The word semantic stands for the meaning of . The Semantic Web encom-
passes eorts to build a new World Wide Web architecture that enhances content
with formal annoations. It is supposed to create a universal medium for exchanging
information in a way understood by computers [62, 17]. Consequently, browsing and
searching in the cyberspace is simplied.
One of the most important advantages of the Semantic Web is exibility. Dif-
ferent kinds of data can be used altogether and diverse types of analysis can be
15
applied over it [64]. For instance, a book can be described with Dublin Core [9]
2
annotations whereas information about the author can be expressed by using the
3
FOAF (Friend-of-a-Friend) vocabulary [7]. Moreover, vocabularies can be easily
broadened by creating modules or sets [21].
Thanks to semantic annotations, some reasoning and inference can be performed
on using a learner's description (his/her identity, relationships). As such, the Se-
mantic Web represents a promising technology for realizing eLearning requirements.
Figure 2.2: The Semantic Web Stack (from W3C)
The Resource Description Framework
Semantics entails description issues, so that artifacts are understood and eciently
processed by machines. The Resource Description Framework (RDF) is a model for
4
metadata description. It is a W3C standard for describing web resources which
have been assigned a URI by which they can be identied. It was designed to be
read and processed by machines, not to be displayed to people. On account of RDF,
2 http://dublincore.org/
3 http://www.foaf-project.org/
4 http://www.w3.org/
16
programs or automated scripts (crawlers) can eciently search, discover, collect and
process information from the Web.
RDF is based on statement concepts. In a statement, there is a subject, a
predicate and an object; altogether they are called a triple (a statement) [58]. A
collection of RDF statements produces a directed graph in which arrows point from
subjects to objects and texts on arrows are predicates.
A fact Ronaldo is a football player can be represented by an RDF state-
ment that has the following structure:
• a subject (resource): Ronaldo
• a predicate (property): is a
• an object (value): football player
Supposing all three parts are attributed with URI with http://example.com
namespace, the above statement can be illustrated by a graph showed on Fig. 2.3:
Figure 2.3: RDF statement
Besides the graph, RDF N3 representation can be used to show triples and rela-
tionships between them. See the List. 2.1 to learn the structure of N3 representation
of the above showed graph.
Listing 2.1: N3 RDF representation
<h t t p : / / e x a m p l e . com/ p e o p l e#Ronaldo>
<h t t p : / / e x a m p l e . com/ p r o p e r t y#i s >
17
<h t t p : / / e x a m p l e . com/ p r o f e s s i o n#f o o t b a l _ p l a y e r > .
Using triples is eective and very popular. However, for the representation rea-
son, XML language can be employed as well (see List. 2.2).
Listing 2.2: RDF/XML representation
<?xml v e r s i o n ="1.0"? >
< r d f : RDF
x m l n s : r d f =" h t t p : / /www . w3 . o r g /1999/02/22 − r d f −s y n t a x −n s#"
x m l n s : p r o p e r t y =" h t t p : / /www . e x a m p l e . com/ p r o p e r t y#">
<r d f : D e s c r i p t i o n
r d f : a b o u t=" h t t p : / /www . e x a m p l e . com/ p e o p l e#R o n a l d o">

</ r d f : D e s c r i p t i o n >
</ r d f : RDF>
RDF Vocabulary Description Language
According to W3C [59], RDF aims at represent information on the Web so that it is
processable by machine agents; RDF Schema is a semantic extension of RDF. It is
a description language of the vocabulary of RDF [11]. Consequently, it is possible
to describe groups of related resources (their domain and ranges of properties) and
relations between them.
Ontologies
Ontology is a word with quite a handful of meaning. The term is borrowed from phi-
losophy. It refers to the science of describing entities in the world and relationships
between them.
Although RDF and RDF Schema are helpful in expressing simple statements,
they lack when used in more complex cases. That is why Web Ontology Language
(OWL) was developed. OWL is a markup language for publishing and sharing data
using ontologies on the Internet. It consists of three sub-languages: OWL Lite,
18
OWL DL and OWL Full. Each sub-language encapsulates the former ones. It is
mainly the level of restrictions, which distinguishes them.
An OWL ontology contains a description of classes, properties and their in-
stances [56, 57]. Also, it allows us to dene cardinality constraints on properties,
specifying transitivity and uniqueness.
In general, an ontology represents a domain and objects within that domain. It
is a form of knowledge representation of such domain. Below, I oresent the list of the
most popular RDF Schema metadata denitions for specic domains of interests:
5
• people and social networks: Friend Of A Friend (FOAF)
6
• online discussions: Semantically-Interlinked Online Communities (SIOC)
7
• career: Description Of A Career (DOAC)
8
• project: Description Of A Project (DOAP)
• thesauri, taxonomies and subject-heading systems: Simple Knowledge Orga-
9
nization System (SKOS)
Searching and browsing
The most popular way to search on the web is text searching. It is supported by
Google, Yahoo and other search engines. One just enters the query string and then
is given a set of possible answers. The list is huge and often consists of garbage
information, though.
Semantics tries to enhance searching process. It is achieved by introducing se-
mantic indexing and query renement. The former makes it possible to measure
distance between terms; the latter improves imprecise query string so that more ad-
10
equate results are found [29]. Using dictionaries, like WordNet , can boost search-
ing process by eliminating disambiguity caused by using homonyms, synonyms and
5 http://www.foaf-project.org/
6 http://sioc-project.org/
7 http://ramonantonio.net/doac/
8 http://usefulinc.com/doap/
9 http://www.w3.org/2004/02/skos/
10 http://wordnet.princeton.edu/
19
words used in non-basic form. As described earlier, RDF consists of graph struc-
ture and literals. Thus, a search can be performed by using both keywords and
structured queries.
2.2.3 Semantic Web and eLearning

eLearning aims at just-in-time, task relevant learning. No longer should there be
a centralized authority (teacher) who foists already dened course schedule on stu-
dents. It is impossible to satisfy all students' needs because they dier one from
another.
The Semantic Web can be successfully employed for describing LOs which rep-
resents learning material. Software agents can perform continuous scanning of se-
mantic descriptions of LOs to build a huge, decentralized knowledge repository.
Additionally, agents may use a commonly agreed service language, which boosts
their cooperation. Consequently, creating a course adjusted for a specic learner is
becoming signicantly simpler and faster. Then, it is possible to use diverse types
of learning objects.
Limitations of the Semantic Web
So far, I have pointed out a great many virtues of the Semantic Web, especially
when introducing it to eLearning. It distinguishes itself with prefect theoretical
assumptions and solutions. Nevertheless, practical experience has proved that the
Semantic Web is far from changing the vision of the Internet; it needs some help to
become a reality and face the current problems.
Some society-scale applications are required. The above mentioned agents are
necessary to process decentralized semantic annotations, which must be created as
well. To make shared data real, some more advanced collaborative applications are
required.
20
2.3 Web 2.0
Although Web 2.0 is currently a very popular term, it is dicult to give its precise
denition. Even Tim Berners-Lee, the inventor of the Internet has dicult in doing
that:
Web 1.0 was all about connecting people [. . . ] It was an interactive
space, and I think Web 2.0 is of course a piece of jargon, nobody even
knows what it means. If Web 2.0 for you is blogs and wikis, then that
is people to people. But that was what the Web was supposed to be all
along.
sir Tim Berners-Lee
In short, Web 2.0 is the Web where people meet, collaborate and share anything
that is popular by using social software applications. The term refers to second gen-
eration of Internet-based services: blogs, wikis, communication tools and platforms
11 12 13 14 15 16
like del.icio.us , Flickr , Skype , Wikipedia , last.fm , Technorati .
Web 2.0 applications derive from new techniques such as rich internet applica-
tions (RIA), Asynchronous JavaScript and XML (AJAX), semantically valid Ex-
tensible HyperText Markup Language (XHTML), Cascading Style Sheets (CSS),
Syndication and aggregation of data in RSS or Atom, clean and meaningful URLs.
A user of Web 2.0 must feel as if he/she used traditional desktop applications to
share anything with the community.
In accordance with Tim O'Reilly [43], the meaning of Web 2.0 can be presented
by contrasting the traditional Web with new Web 2.0 in Table 2.2.
11 http://del.icio.us/
12 http://www.ickr.com/
13 http://www.skype.com/
14 http://en.wikipedia.org/
15 http://www.last.fm/
16 http://www.technorati.com/
21
Table 2.1: New trends in the Web (concept: [43]).
Web 1.0 Web 2.0

platforms Netscape, Internet Google Services, Flock
Explorer
web pages static personal web- dynamic blogging
sites
portals Content Management wikis
Systems
encyclopedia Britannica Online Wikipedia
arrangement directories (tax- tags (folksonomies)
onomies)
2.3.1 AJAX
AJAX is a web development technique to create web applications as if they were
desktop ones. The aim is to exchange only small amounts of data with a server; this
should be performed behind the scenes. No longer should entire page be (re)loaded.
17
One of the rst Web 2.0 applications was Google Maps , a set of interactive
maps of the world. One can watch diverse views of the world, change the way the
views are displayed and personalize them. There is a constant dialog between the
server and client application, but a page is not reloaded.
2.3.2 Democracy
Democracy in Web 2.0 is very important [12]. Users, often amateurs, collaborate
and share anything that is popular. Without users, many Web 2.0 application would
not live for long.
Del.icio.us is a collection of favorites. The idea is based on keeping bookmarks
and sharing them with other users; users collaborate and share information. It is
similar with Wikipedia, a free encyclopedia. Wikipedians can write new articles,
edit existing ones. Yet, all Wikipedia users are anxious about the quality of their
17 http://maps.google.com/
22
18
encyclopedia. There are even Web 2.0 news services, like Reddit . It is a set of news
items and articles which were found interesting by other people, and consequently
added there.
Aforementioned examples expose the importance of the Internet users. Web 2.0
exists and is becoming more and more popular since users try to evolve, expand and
improve it. One can share anything and in return is allowed to use others' products.
2.3.3 Social network

A social network consists of users who collaborate and share, using the Internet,
which brings about online communities social networks (see Fig. 2.4). The main
reason why a user belongs to social networks is the desire to share and meet oth-
ers with a similar domain of interests. Collaboration is a good way of reaching
information and knowledge [46].
Communication can be divided to three modes, which is classied on the basis
of the techniques used:
• one-to-one emails, instant messaging
• one-to-many web pages, blogs (see Sec. 3.1.1)
• many-to-many forum, wikis (see Sec. 3.1.2)
Networks have diverse sizes. In a small, tight one, there are few people who form
a kind of a private area. However, there can also be a lot of participants with loose
connections (weak ties). From the collaboration point of view, the latter mode is
more valuable as it is more probable to introduce new ideas. Hence, it is better
to have connections with other networks than with only one. However, unlimited
access to information exchange can involve some risk; there is a possibility that a
social network is ooded with unneeded information. To avoid that, or at least to
limit the possibility of reaching poor data, rating and annotating shared resources
were introduced.
18 http://reddit.com/
23
Figure 2.4: An example of a social network
Scale-free network
In a scale-free network, there are many very connected nodes (hubs) which have
high degree of connections. The important characteristics of scale-free networks is
that the ratio of those well connected hubs to the number of nodes in the rest of the
network remains constant as the network changes in size.
2.3.4 Tagging
A tag is a label associated with or assigned to a piece of information such as a
web page, a photo or a movie. It is a keyword, which les and classies resources.
Popular services that use tags are del.icio.us and Flickr. The former uses tags to
label favorite web pages, while the latter employ them to marker photos.
A tag cloud represents a collection of tags in a way a user is capable of distinguish
more popular tags from less popular ones; the former are written in bigger font than
the latter. Popularity is seen either by the number of items that have been given a
tag (like at Flickr) or the number of times the tag has been applied to a single item
24
(like at last.fm). Clicking on a tag from the cloud shows the list of resources which
were labeled with that tag.
19
The TagCommons project is aimed at creating ways to share and interoperate
over tagging data. The idea of the project is to benet from rich social tagging
across applications, communities, and spaces by introducing an ontology for tagging
descriptions.
2.3.5 Mashups
A mashup is a web page which oers a number of online services from various
20 21
sources. It allows using existing applications like Google Maps , Google Calendar
22
or Yahoo! UI Library (YUI) . It is possible due to access to their public APIs, Web
feeds (RSS or Atom) and JavaScript.
2.4 Semantic Web 2.0

Formerly, I have introduced the Semantic Web and Web 2.0 - new technologies which
have impacted the Internet development. The former is a low-level solution whose
assumption is to produce standards and recommendations helpful in interlinking
applications, the latter is high-level, user experience-minded and supposed to provide
user applications [21, 65].
Both these standards can be overlapped to make even better benets. By involv-
ing Web 2.0 techniques into the Semantic Web solutions, we get Semantic Web 2.0
applications which not only act as desktop ones, with a ne looking user interface,
but also carry information understood by machine agents.
2.4.1 Metadata development

There are three approaches to creating metadata. They can be created by profes-
sionals, by authors or directly by users.
19 http://tagcommons.org/
20 http://maps.googe.com/
21 http://www.google.com/calendar
22 http://developer.yahoo.com/yui/
25
Table 2.2: Metamorphosis of the Web (concept: [21]).
Web 1.0 Web 2.0 Semantic Web

2.0
web pages static personal blogs Semantic Blogs
websites
portals Content Manage- wikis Semantic Wikis
ment Systems
search engines Altavista, Google Google Per- Swoogle, The
sonalized, SHOE Search
dumpnd.com Engine
books, articles Project Guten- Google Scholar, JeromeDL, NDSL
berg Google Book
Search
collaboration Message Boards Community Por- Semantic Forums,
tals Community Por-
tals
socializing Address Books Online Social Net- Semantic Social
works Networks
Web space Social Seman-
tic Information
Spaces
26
The most traditional states that the metadata is created by dedicated profession-
als; it has a form of catalog records created by complying to complexed rules which
are not understood by laymen. Moreover, organizing and developing the catalogs is
expensive and time-consuming.
Author-created metadata approach assumes that authors are responsible for sup-
plying their work with metadata since they know them best. It helps with the scal-
ability problem, but still users are only the recipients and do not have the inuence
on the data.
User dened metadata solves the scalability problem and involves users in the
cataloging process [35].
Folksonomies
Tags (see Sec. 2.3.4) arose along with Web 2.0; they played the role of taxonomies
were supposed to to classify resources. Taxonomies are developed with controlled
vocabulary; users are imposed with them.
The Semantic Web 2.0 has set users free from using predened vocabularies. It
gave one more freedom in that eld by introducing folksonomies. One of the most
popular services that involve folksonomies are del.icio.us and Flickr. As the name
suggest (it is a combination of words folk and taxonomy), a folksonomy is being
developed by folks (users) who collaborate. A more technical denition says it is
an open-ended labeling system with low entry costs that enables Internet users to
categorize content using tags. Tags in a folksonomy are metadata about categorized
resources; they make a body of information considerably easier to search, discover,
and navigate over time.
In other words, folksonomies are the simplest way of knowledge representation
in the Semantic Web 2.0 depiction. At the same time, they bring into the Semantic
Web 2.0 the whole potential of Web 2.0. That is why they also appeared in table 2.2.
As I stated earlier in this paper (see Sec. 7.2.2), the Semantic Web is about
describing information so it is readable and understood by machines.
An important aspect of of a folksonomy is using a at namespace. There is no
hierarchy concerning tags. A parent or a sibling of a tag cannot be specied. Instead,
27
a tag can be interlinked with others related tags. The relationship is established by
analyzing the URLs. Related tags can be used to broaden or widen the range of
found information and to nd information somehow associated with current tag [35].
The most important limitation of folksonomies is the fact that there is no scope
information and systematic guidelines, which results in ambiguity. A tag can include
a large number of information from dierent subject matters. Then there is no
synonym control. How to classify photos of oneself ? In me or selfportret tag?
Then how to reach the most appropriate information about Macintosh? By searching
in apple or macintosh tag? There also is a problem concerning multiple words
and spaces, as usually multiple words are not allowed. Finally, the problem of plural
and singular forms and conjugated words appears. There is no strict rules which
form should be chosen. As I said, a user is given a free hand in tag's name selection,
so there is a risk of existing tags which are senseless or a few versions of the tag that
could be named once and for all with one name.
However, these problems can be solved in simple ways. One way is to educate
users to add better tags. They should be advised to use plurals in basic forms.
They also shall be taught not to make spelling errors and avoid personal tags (e.g.
mydog) that are meaningless to the community. Then, tagging systems should
catch misspelled and not recommended words and give users advice at run-time [34].
There are some initiatives who try to learn how to order tags in folksonomies such
23
as taga.licio.us .
23 Taga.licio.us: a way to integrate del.icio.us http://frenchfragfactory.net/ozh/archives/2004/10/05/tagaliciou
a-way-to-integrate-delicious/, accessed: December 28, 2006
28
Chapter 3
Social Semantic Information Sources
and eLearning 2.0
The goal of this Master's Thesis is to employ Social Semantic Information Sources
for eLearning; that is why it is necessary to understand what the Semantic Web 2.0
is and how it can be used for eLearning. So far, I have introduced those technologies
(see Chapter 2). In this Chapter, I explain the idea of Social Semantic Information
Sources (see Fig. 3.1) and make a review of their most popular examples (semantic
blogs, semantic wiki, and Social Semantic Digital Library). Using that information,
I dene a common model of SSIS and propose a consistent way of its description.
Then, I present eLearning 2.0, a new approach which tracks informal learning so
widely available in the Internet.
3.1 Examples of Social Semantic Information

Sources
3.1.1 Semantic Blogs
According to Moeller [37], blogs (weblogs) are online journals or diaries created and
leaded by one to publish their opinions, thoughts and web links [42]. Although
most blogs are textual, some focus on photographs (photoblog), videos (vlog), audio
(podcasting). In general, blogs are part of the wide network of social media.
29
Figure 3.1: Location of SSIS in the Web (gure concept: [21])
A blog's owner can be motivated by the desire to introduce themselves to others.
He/she can be also a beginning writer who looks for the audience. Finally, a blogger
can be a professional in some eld of interest, e.g., a specialized photographer or a
master of science. Anyway, whoever the blogger is, his or her main reason to create
the blog is to share some information with the broader community.
Blogs are updated by habitually writing new entries (posts). They are usually
showed to visitors in reverse chronological order. New posts can be syndicated
with headlines, hyperlinks and summary using RSS or Atom formats. This allows
interested readers know about changes to the blog.
1
Visitors can read posts and annotate them. Due to Technorati , blogs are pow-
erful since they allow millions of people to easily publish and share their ideas, and
millions more to read and respond. They engage the writer and readers in an open
conversation, and are shifting the Internet paradigm as we know it.
1 http://www.technorati.com/
30
Blog as a tool in eLearning
2
According to Technorati statistics from August 2006 , there were fty million blogs
in the Internet, and their number had been doubling every six months or so since
November 2002. At that stage, the number was one hundred times bigger than it
had been three years earlier. That day, about 175 000 new blogs and about 1.6
million posts were created each day. These numbers demonstrate the potential of
blogs.
Being so popular, blogs can support the learning process. Yet, not only do they
remove the technical barriers to writing and publishing online, but, thanks to their
format, they also encourage students to sharing their ideas.
According to O'Hear [41], Will Richardson was one pioneering educational blog-
3
gers. By using Manila , a blog software, he encouraged his English literature stu-
dents to publish a reader's guide to the book The Secret Life of Bees. The author
of this book helped in that experiment by answering questions and commenting on
what the students have written. This way, a small community of people interested
in a certain topic arose.
Will Richardson succeeded since he relied on the main concepts of weblogs, the
power of collaboration, which can be used in eLearning. Students can use weblogs
for exchanging their experience, publishing their notes or gained knowledge. Yet,
other students or even teachers can write annotations to express their feelings about
records of one's thinking.
Syndication of blog content
Blogs would not be so helpful in studying if it were not for exposing machine-readable
listings. There is a family of XML-based standards for describing the contents of a
blog. Syndication services generate feeds, which are portions of information about
changes to a blog. The most popular standards [22] are Really Simple Syndication
0.92, Atom and the RDF Site Summary 1.0 (it fully supports RDF).
2 State of the Blogosphere, August 2006 http://www.sifry.com/alerts/archives/000436.html;
accessed: December 29, 2006

3 http://manila.userland.com/
31
Semantics for blogs
4
There is a large number of blogging publishing services available, such as Blogger or
5
WordPress . That services provide a wide range of tools for creating and managing
blogs. However, they lack from semantic description of the content: topic of the
posts, their content or connection with other posts, perhaps from others blogs.
To make a blog also machine readable, rich metadata for its content must be
provided [54]. The metadata can belong to one of two domains:
• structure information about the composition of a blog. It describes a blog
itself or its parts (posts, comments, hyperlinks) and relations between them
• content describes a post's topic, which can be an event, a person, a book
etc. The structure of the content depends on what it really describes.
The data can be either mixed in a post (seen by a reader) or added in a hidden,
computer-understandable RDF format. Due to RDF, computers can interpret and
process the metadata; machines can nd connections between one's blog posts and
other blogs, quickly obtain information about a post's author or a described event.
Consequently, browsing and exploring blogosphere is more ecient.
6
Semantically-Interlinked Online Communities (see Sec. 3.2.1) project (SIOC)
delivers a plug-in (SIOC Exporter) for a few most popular blogging platforms:
7
• WordPress one of the most popular blogging tool
8
• DotClear blogging platform used mostly in French
9
• Drupal content management platform for blogs and fora
10
• b2evolution
4 http://www.blogger.com/
5 http://wordpress.org/
7 http://wordpress.org
8 http://www.dotclear.net/
9 http://drupal.org/
10 http://b2evolution.net/
32
SIOC plug-in adds additional information about the site, a hyperlink to extract
RDF document for the whole blog or its posts. These metadata describe the blog
who hosts a post, and gives some information specic for a blog post, the author,
the topic, external links, the date of creation, the content of the post, etc. To learn
more about SIOC project, see 3.2.1
3.1.2 Semantic Wikis

Wiki is an interlinked website developed and maintained by a community. The most
11 12
popular wiki engines are MediaWiki (Wikipedia is based on it) and MoinMoin
13
Wiki .
The inventor of Wikipedia is Ward Cunningham; he introduced its idea at a
programming language pattern group. A wiki has a simple text syntax for creating
new pages. Users can easily create the contents (ad hoc) and edit existing informa-
tion using a web browser. They do not have to even be logged in to do that [16].
Wiki provides easy and deep linking by using names. In other words, if a wiki page
contains a word or phrase which is the topic of another page in that domain it is
automatically linked to that page. That, quite straightforward feature, improves
navigation; moreover, this works for pages which do not exist yet.
As everyone is allowed to interfere what other see, the contents must be checked
only then the information a wiki provides is reliable. Each community mem-
ber can be a moderator. Reliability is achieved with versioning and di features.
Each wiki page has a history of changes which can be easily tracked by compar-
ing dierences between them. Thus, in case of occurrence of errors changes can
be easily reverted. All the aforementioned features make wikis a powerful tool for
collaborative work.
There can be many reasons for creating a wiki. Wikipedia is the most popular
encyclopedia based on a wiki engine in the Internet. Wikis can also be used to
14
manage the open source software documentation, like Jakarta does. It is convenient
11 http://www.mediawiki.org
12 http://wikipedia.org/
13 http://moinmoin.wikiwikiweb.de/
14 http://jakarta.apache.org/
33
to use a wiki as a personal information management system. Finally, it is commonly
15
used as a discussion platform in companies' intranets (see TWiki ).
Semantics makes wiki better
Wikis seem to be a good way of making people cooperate and a powerful informal
source of knowledge. To better use their potential, the structure and the content
of wiki pages shall be modeled by using semantic description. Semantic wikis allow
user to add additional metadata (semantic descriptions) for described concepts. This
data shall mark the place of its occurrence so that the system is capable of extract
relevant data without understanding the rest of the text. As a result, it helps to
organize, search, browse, share, and annotate the wiki's content. Semantics enhance
the searching process; it is not limited to only keyword based searching. It introduces
queries similar to structural databases.
For instance, a wiki with articles about rock songs could annotate these pages
with little pieces of additional data (written in RDF), such as this song was made by
Red Hot Chili Peppers , or This song was published in 2000 . A user does not have
to know RDF syntax to annotate. Thus, the wiki can reason on the annotations
and for instance reach songs of a specic band.
Currently, there are few Semantic Wiki solutions working:
16
• Semantic MediaWiki extension of MediaWiki (see Sec. 3.1.2)
17
• IkeWiki web-based wiki (prototype)
18 19
• Makna its engine implementation is based on Janne Jalkanen's JSPWiki ;
20
it uses Jena , the Semantic Web engine used by HP
21
• SemperWiki a semantic personal wiki developed for the Gnome desk-
top [44]
15 http://twiki.org/
16 http://meta.wikimedia.org/wiki/Semantic_MediaWiki
17 http://ikewiki.salzburgresearch.at/
18 http://www.apps.ag-nbi.de/makna/
19 http://www.jspwiki.org/
20 http://jena.sourceforge.net/
21 http://www.semperwiki.org/
34
There are three ontologies designed to deal with wikis:
22
• WikiOnt aims at integrating Wikipedia (and by extension other
MediaWiki-based sites) into the Semantic Web framework
23
• SWIFT
• SIOC (see Sec. 3.2.1).
Semantic MediaWiki
MediaWiki is one of the most popular wiki engines. The most known wiki,
Wikipedia, is based on it. However, MediaWiki does not support the Semantic
Web demands. Although, the HTML code is to some extent semantic, there is no
place for such features like OWL and RDF.
To make a MediaWiki-like wiki a semantic one, one can instal the Semantic Medi-
aWiki extension [16]. Its goal is to make important parts of MediaWikis knowledge
machine processable with as little eort as possible. For that reason, there are
instructions on how to improve typed links, attributes and types, and introduced
semantic templates.
Typed links are treated like semantic relations between two concepts described
in articles. A typed link is obtained by extending the way of creating a hyper-
24
link. Let us take the main page of Corrib Clan Wiki as an example. On this
site, there is information about Corrib Clan like projects developed by their mem-
bers and the supervisors. There are a number of typed links on that page. The
hyperlink to the article about Didaskon not only gives the page location but also in-
troduce some additional information, that Didaskon in subproject of Corrib: [[has

subproject::Didaskon]]. From this template, an HTML hyperlink is contained.
This template is built from two main parts. First part (the expression before ::)
describes the relation; the second part (after ::) is a hyperlink to the article within
the wiki. So, this example says that Didaskon is a subproject of Corrib.
22 http://sw.deri.org/2005/04/wikipedia/wikiont.html
23 http://ontoware.org/projects/swift/
24 http://wiki.corrib.org/
35
Besides typed links, Semantic MediaWiki introduces better way to manage at-
tributes of concepts. Since each typed link connects two wiki pages, not all informa-
tion can be stored as a relation. For that reason, one uses attributes. On the above
mentioned Corrib Clan Wiki main page there are a few attributes as well. For exam-
ple, [[is supervised by:=sebastian_DOT_kruk@deri.org]] means that Corrib
Clan is supervised by Sebastian Kruk. The dierence between a typed link and an
attribute is the operator; now it is :=.

Relations and attributes describe the concept of an article in a machine pro-
cessable way. A set of relations and attributes is situated on the bottom of the
article page. But machines are not obliged to scrape the content of the page.
Semantic MediaWiki allows extracting these annotations with an RDF feed. For
http://wiki.corrib.org/index.php/Main_Page, the RDF description is available

at http://wiki.corrib.org/index.php/Special:ExportRDF/Main+Page . More-
over, Semantic MediaWiki allows querying on-the-y.
DBpedia.org
25
DBpedia.org is a project that aims at extracting structured information from
Wikipedia and to make this information available on the Web. The information
is often published on Wikipedia articles in special boxes by using special templates.
Also, DBpedia allows us to ask queries against Wikipedia and to link other datasets
on the Web to Wikipedia data.
3.1.3 Social Semantic Digital Library

In this Section, I focus on Semantic Digital Libraries. I present how they introduce
freshness to traditional libraries. I explain the reason for applying semantics to
them. Finally, I describe JeromeDL, the rst Social Semantic Digital Library. All
in all, I point out the importance of Social semantic Digital Libraries to learning
process.
25 http://dbpedia.org/
36
Digital Libraries
A library is a source of organized knowledge in various areas. Popularity of comput-
ers and the Internet expansion, brought in digital libraries [6]. In a digital library,
resources are machine readable and full-text index improves searching. Resources
become available and easily accessible through the Internet.
There were some quite innovative methods adapted to digital library commu-
nities: taxonomies, thesauri and classication schemes. They were introduced to
improve management of the signicant collections. Managing resources in a digital
library would be impossible if they were not suciently described; electronic anno-
tations play an important role since they bring more information about books. The
most popular description formats are MARC21, BibTeX and Dublin Core.
Besides searching and reading, users are allowed to download resources for further
use. In fact, downloading seems to be a substitute for traditional book borrowing.
Digital libraries also handle access rights. Some resources can be hidden from users
who do not have enough permissions to access them.
Semantic Digital Libraries
Digital libraries already have controlled vocabulary and taxonomies. All of them
even have metadata in place. In semantic digital libraries, rich and extensive se-
mantic annotations (metadata) make resources accessible not only with machines
but also by machines.
The metadata is modeled with RDF (see Sec. 2.2.2). Searching is more ecient
and gives more accurate results; it reects meanings of terms.
Currently, there are a few semantic digital libraries:
BRICKS26 is a fully decentralized platform that allows low-cost, transparent access
to distributed information sources by Web Services. It is internationalized,
easy installable and manageable. BRICKS is still in development stage and is
planned to support existing systems, not replace them.
Artifacts, which are from cultural heritage domain, are arranged in hierarchical
structure and can be stored internally or in any other place by keeping their
37
references. It also supports various metadata schemas dened in OWL-DL.
Bibliographic resources are described with RDF. Again records can be queried
in SPARQL.
Fedora27 is a service-oriented platform for managing and delivering digital con-
tent. It is developed jointly by Cornell University Information Science and
the University of Virginia Library. By using SOA, Fedora's developers aspired
to achieve interoperability and exibility. Digital objects consist of linkages
between data streams (internal or external content les), in-line or external
metadata, system metadata and behaviors, that are code objects providing
bindings and links to disseminators.
SIMILE28 (Semantic Interoperability of Metadata and Information in unlike En-
vironments) is developed by W3C, HP, MIT Libraries, and MIT's Lab for
Computer Science.
SIMILE project provides some tools for metadata managers and common end-
users. They all deal with RDF: allow to extract XML and HTML les, inspect
and edit RDF les. SIMILE extends and leverages DSpace and makes library
metadata management easier by facilitating browsing, searching and mapping
heterogeneous data in RDF.
JeromeDL the Social Semantic Digital Library
So far, I have described innovative semantic digital libraries. I have presented how
the Semantic Web improves their features. The potential of semantic digital libraries
can be even more improved by applying Web 2.0 abilities. A semantic digital library
can give some space for collaboration. Users can leave a trace by making annotations
and evaluations of the resources. By supporting Web 2.0 collaboration aspects (com-
ments, blogs, shared bookmarks, tagging, etc.), a semantic digital library becomes
a social semantic platform. Also, it turns into a dynamic collaborative informal
knowledge repositories. A social semantic digital library aims at integrating infor-
mation collected from heterogeneous metadata sources, like resources descriptions,
user proles, bookmarks and taxonomies.
38
JeromeDL29 [49] is developed at Digital Enterprise Research Institute, Gal-
30 31
way (DERI) with collaboration from Gda«sk University of Technology by a
group of MSc and PhD students, including myself. It has 2-layer metadata en-
richment. The lower level, MarcOnt Mediation Service, supports legacy metadata
(DublinCore [9], BibTeX and MARC21 [1]), which allows interoperability with al-
ready existing digital libraries systems [48, 47, 26].
The upper level is community oriented [31]; a community of users can interact
in a Web 2.0 manner by tagging resources through Social Semantic Collaborative
Filtering (SSCF) [50]. Users can evaluate and annotate resources. Users' data
is stored in a private bookshelf, in semantically annotated directories. They can
share this information with other users, base on their prole, which is managed by
FOAFRealm [30, 13].
In JeromeDL, content managing and browsing is simplied due to an intelligent
search engine. Users can form queries even in natural language (NL) by using query
templates.
There are seven ontologies supported by JeromeDL and they can be grouped as
follows:
• User prole management component
FOAF Friend-of-a-Friend - describes person/agent prole
FOAFRealm allows identity management
• JeromeDL resource structure management JeromeDL
• MarcOnt Mediation Service [48, 47, 26] metadata about bibliographic re-
sources
DublinCore
BibTeX
MARC21
29 http://jeromedl.org/
30 http://www.deri.ie/
31 http://www.pg.gda.pl/
39
MarcOnt
Among other features, JeromeDL also allows exporting the description of its
resources. One can obtain it in BibTeX, DublinCore or in MarcOnt, the ontology
prepared specially for bibliographic reasons.
3.2 Model of Social Semantic Information Sources

So far, I have described basic Social Semantic Information Sources examples. The
analysis of SSIS asserted myself that there are a few main concepts regarding SSIS:
• collaboration (social aspect online community)
• the content of resources
• rich metadata describing the content (semantics)
• enormous number of valuable information available
The last point suggests the potential of SSIS. Being collaboration-minded, online
community sites, like blogs, wikis, bookmarks sharing systems, allow users to create a
network where they can feel free to band together: share ideas and opinions, publish
links and works and comment them; any resource can be annotated. Consequently,
plenty of relevant information can be extracted; this data can support the learning
process. For instance, it can be served as an additional material to read. All in all,
this data can be treated as informal knowledge.
The main problem of online communities is that they are dispersed over the
Internet. Although their content is valuable, it is dicult to reach it. Current
solutions allow mainly text based searching, so a user must browse many web pages
to nd what he/she looks for.
The Semantic Web assumes rich description of the resources; its main postulate
says semantic annotations make the content readable by machines, which allows
better navigation and more ecient searching.
40
Figure 3.2: Online communities overview (from [4]).
3.2.1 SIOC
32
SIOC (Semantically-Interlinked Online Communities) is an initiative that is sup-
posed to overcome the above mentioned problem [20]; its goal is to interconnect
online communities. SIOC can be used in published or subscribed mechanisms,
as it stores community-like metadata such as information about the post's author,
enclosed links, the creation time, connection with other web pages.
The core of the SIOC framework is the SIOC ontology which is based on RDF
(Resource Description Framework). The ontology consists of a set of classes and
properties which link them:
Site is the location of an online community or set of communities.
Forum is a discussion area, housed on a site.
Post can be formed as an article, a message or an audio- or videoclip. A post is
41
written by an author, has a topic, a content, external links, etc.
User represents an account held by an online community member.
Usergroup is a set of accounts of users interested in a common subject matter.
Figure 3.3: Main concepts in SIOC Ontology (from SIOC homepage)
Mapping in RDFS and OWL allows exchanging community instance data by
importing and exporting SIOC data in dierent vocabularies. This manner, the
amount of existing available data can be controlled. Also, SIOC makes cross-site
queries and topic related search on sites with SIOC metadata more ecient [4]. I
have already written about SIOC plug-in for a few blogging platforms (see Sec. 3.1.1).
SIOC ontology is still developed; recently, its authors have been trying to apply
it to other collaborative services. At the moment, it is possible employ it form
modelling wikis, image galleries, event calendars, address books, audio and video
channels and a few more.
3.3 eLearning 2.0

When we think of eLearning today, we probably think of Learning Objects (LOs) and
Learning Management Systems (LMSs) that provide online courses (see Sec. 2.1).
LMSs seem to be ubiquitous; thousands of instructors and students in a large number
of universities and colleges use products provided by companies such as WebCT,
42
Blackboard, and Desire2Learn [8]. To recap, LMS organizes the learning content in
a standard way, and delivers it to learners in the form of courses.
This great approach lacks from many limitations, though. The main problem of
current LMSs is that they deliver courses prepared for a generic student. They are
personalized, but prepared basing on an individual's view and supposed to satisfy
all. However, the learning path shall be adaptable, created dynamically. Also, LMSs
focus on a small group of students, for instance a group of studemts in a class; they
do not allow a broader community. Moreover, students should benet from not only
their repository (formal learning), but also use collected learning material widely
available on the Web [33].
eLearning 2.0 has emerged from Web 2.0 developments. According to DTI Global
Watch Mission [36], its key characteristics are:
• enabling a more active role of the user/learner
• knowledge and information sharing Web 2.0 core assumption
• diversity of content and media Web 2.0 services (blogs, wikis, multimedia
and bookmarks sharing systems)
• ease of collaborative learning
• informal learning
Blogs were one of the rst Web 2.0 services used in the newer eLearning approach.
Students' blog posts are often about something from their own range of interests,
rather than on a course topic or assigned project. Students run blogs and read others'
blogs; consequently, they create a social network with loads of useful data [36]. Then,
wikis, RSS, podcasting services, and others Web 2.0 platforms have emerged. All in
all, the number of available resources has increased, which occurs as a problem for
content management systems [2].
3.3.1 Is there a place for semantics?

In the Semantic Web, data are processed both by human and machine agents; this
is possible due to ontologies (see Sec. 7.2.2). Thus, machines can produce intelligent
43
responses for unforeseen situations. But the real power of the Semantic Web can be
realized when heterogeneous data from diverse environments are collected, processed
and sent for further use [33]. Ontologies organize learning material around good
semantic annotations of learning objects. Also, they can be used to describe user
proles in order to compose the best course for him/her basing on semantic queries.
Description of learning material is essential for course composing. The main
problem of current eLearning is that there is no standard that denes description
of LOs. We have many LMSs and most of them describe LOs in their specic way.
Thus, it is impossible to exchange LOs between dierent LMSs, integrate learn-
ing content used by other LMS and create common searchable content and content
repositories. Advanced Distributed Learning (ADL) Initiative introduced SCORM
(see Sec. 2.1.2) which is a collection of standards and specications adapted from
multiple sources to provide a comprehensive suite of eLearning capabilities that en-
able interoperability, accessibility and reusability of Web-based learning content [32].
However, SCORM has introduced its own XML formats and methodologies [3]. One
of the standards that underly SCORM is LOM; its goal is to provide rich descrip-
tion of learning material (see Sec. 2.1.2). Since LOM is very accurate, many LMSs
support it. This way, exchanging LOs between them is, to some extent, facilitated.
Although SCORM tries to introduce semantics to the education community,
learning content is still not much machine-readable. By bringing the Semantic Web
to eLearning, it is easier to integrate learning material with other material and dene
services; it allows interoperability, exibility, and machine-readable description of
learning material [3, 38]. Thus, it is more likely to benet from both formal and
informal sources (Social Semantic Information Sources) of information.
At the moment, considerable eort is put into research in the Semantic Web
and eLearning. There is a number of the Semantic Web educational services and
projects:
AQUA is an ontology-driven Question Answering (QA) system. Its goal is to an-
swer questions, written in natural language (English), about academic people
and organizations. Heterogeneous data for reasoning can be collected from
web pages that contain semantic content. AQUA uses ontologies for rening
44
initial queries, similarity algorithm, and reasoning process [55].
SES (Student Essay Service) is a service for annotating argumentation in student
essays, which facilitates writing essays that really answer the essay question.
Annotations are created by using argumentation categorizations stored as on-
tologies [38].
Elena33 denes a smart space for eLearning on top of Edutella [39] peer-to-peer
(P2P) infrastructure. It brings in interoperability and resource exchange be-
tween dierent heterogeneous educational applications and dierent types of
learning resource repositories. It uses SOAP based Web Services which are
described in WSDL and DAML-S [3, 51].
Edutella is a peer-to-peer (P2P) network that interconnects universities. Within
Edutella, a university is a content provider and a content consumer. The net-
work and all its resources are described in RDF. This allows running ecient
queries, perform replications supposed to achieve workload balancing, and
mapping, mediation and clustering resources and the metadata for them [39].
Memora is an ontology-based document-driven memory, which allows to eciently
manage learning material through indexing by the means of ontologies [2].
3.3.2 Didaskon
34 35
Didaskon is a project developed in the Digital Enterprise Research Institute
(DERI), Ireland by a few students, including myself. It is a research project in
the eLearning eld. Its main goal is to deliver a framework for assembling an on-
demand curriculum from existing Learning Objects (LOs) provided by eLearning
services [52].
It has an access to a repository of LOs described with semantic annotations
LOM ontology. LOs are composed into a learning path for a specic student. Along
with formal Learning Objects, Didaskon also uses the potential of Social Semantic
Information Sources. It is capable of fullling informal learning postulates and
35 http://deri.ie/
45
creates LOs from data harvested from SSIS. Consequently, a user gets a course path
prepared from information collected in both formal and informal way.
Furthermore, Didaskon composition algorithm takes into account some pre-
conditions regarding a user. Each user is described with FOAF ontology [7]. Basing
on a delivered user's prole (knowledge level in dierent domains and goals/ex-
pectations from the course) it is capable of returning learning material customized
for his/her needs. Moreover, the system allows more scalable helper features for
students supervision.
Again, used ontologies link user needs and the characteristics of the learning ma-
terial. Produced curriculum not only reects user requirements, but also introduces
new interdisciplinary, extensible and robust meaning of eLearning.
46
Chapter 4
Informal Knowledge Harvester
Each project is burdened with some risk; when things go wrong it can end up
failing to reach the initial assumptions. Therefore, the design process is very crucial.
Besides dening business goals, I also must identify possible problems and risks.
In this chapter, I introduce existing tools for capturing informal learning and
describe the scope of my project. I dene the functional and non-functional require-
ments for the system and use cases. Then, I introduce the its architecture: the
main components, classes, and Web Services specication. All information gathered
helped me during the implementation stage.
4.1 Capturing informal learning

In this Section, I present existing tools for capturing, tagging, and browsing on-
line resource or metadata for them. I describe their features, and point out their
limitations.
4.1.1 Existing tools

PingtheSemanticWeb.com
1
PingtheSemanticWeb.com is a service for sharing RDF documents. Its engine looks
for RDF data either in the content of the resource with the specied URL or in docu-
ments this resource links to. If such data is found, it is saved to the shared repository.
1 http://pingthesemanticweb.com/
47
PingtheSemanticWeb.com supports FOAF, SIOC, and DOAP ontologies, and other
RDF documents.
The pinging feature is invoked either by typing a URL on the service's home page
or using specially prepared browser buttons. Moreover, PingtheSemanticWeb.com
2
benets from Semantic Radar , an add-on for Firefox web browser; whenever Se-
mantic Radar detects RDF data on a web page, it informs PingtheSemanticWeb.com
about that fact so it can be added to the repository. Software agents can request
the service for a list of stored RDF documents and use that information for crawling
the Semantic Web.
SIMILE Project
3
SIMILE Project Semantic Interoperability of Metadata and Information in unlike
Environments provides tools for metadata managers and common end-users.
Piggy Bank, an add-on for Firefox, changes the browser into a mashup platform,
by allowing to capture metadata for online resources and mix them together. Col-
lected data can be stored locally, tagged, searched, and browsed. Piggy Bank can
capture RDF documents to whom a web page links and from any web pages that are
supplied by screen scrapers. A screen scraper is a little program for collecting
metadata for, also, non-semantic web pages. It is written in another SIMILE tool,
Solvent.
If a user wants to share his collection of metadata, he/she publishes it to the
Semantic Bank, a communal repository of RDF data.
Zotero
4
Zotero is an add-on for Firefox web browser. It helps with collecting, manag-
ing, and citing research material, mainly bibliographic resources. Zotero extracts
RDF injected into XHTML documents; it works with a few standards and microfor-
mats [24]: embedded RDF, COinS, Dublin Core [9], and MARC [1]. Zotero informs
2 http://sioc-project.org/refox/
3 http://simile.mit.edu/
4 http://www.zotero.org/
48
a user it has discovered some mark up by showing a special button in the browser
toolbar. Clicking the button starts capturing process.
A user can easily edit the data saved by Zotero and append additional informa-
tion, such as notes, tags, and related les. Moreover, Zotero can be integrated with
Microsoft Word and WordPress. Captured data can be searched and browsed both
online and oine.
4.1.2 Limitations
All the above mentioned tools are good metadata harvesters. However, they work
dierently, and have dierent possible usages.
Providing Web Services, PingtheSemanticWeb.com allows gathering semantic
annotations for online resources in a shared space. This information can be used
for instance by crawlers while searching for specic piece of data. But, PingtheSe-
manticWeb.com does not come up with the possibility to browse stored data besides
viewing raw RDF documents, which is unacceptable for a common user. Also, it
does not work with non-semantic sources, like Wikipedia.
Zotero is a powerful tool for researchers and students because it facilitates biblio-
graphic resources management. With Zotero, it is easy to browse saved information
about books and articles, search and cite them. However, it only reads embedded
RDF; there is no support for pure RDF data which can pass more knowledge.
Piggy Bank is capable of reading whole RDF documents that a web page links
to. Although it does not support non-semantic web pages itself, it is possible to
write screen scrapers that can do that. In spite of that, it has little support for
eLearning platform; there is no standardized way to use captured data by eLearning
frameworks, like Learning Management Systems.
Analysis of existing knowledge management tools, resulted with a set of signif-
icant characteristics that such a tool must be distinctive with. Not only should it
work with semantic sources of information but also it must operate on non-semantic
web pages, like Wikipedia. It must be easy to extend it so that it supports more
types of websites. Then, a user should be supplied by supportive tools for data
capturing, like browser buttons or add-ons.
49
Also, I have discovered that captured data can considerably boost informal learn-
ing; it can be used in new eLearning frameworks that use both learning material
prepared by specialists and collected by an information harvester.
4.2 System Requirement Specication

4.2.1 System scope
The system I have developed aims at capturing informal learning. It is an SOA
layer for Didaskon system (see Sec. 3.3.2) which works as its extension. The system
provides Web Services for harvesting data from SSIS and providing them in a form
of informal Learning Objects (see Fig. 4.1). Data delivered by the system must be
described with a common object model so that Didaskon can easily reason on it.
Because the system is supposed to collect data, I have named it IKHarvester, from
Informal Knowledge Harvester.
In the picture of the system scope (see Fig. 4.1), you can see SSIS that pro-
vide heterogeneous metadata. Their content is enriched with semantic annotations.
IKHarvester collects the metadata and stores it in the repository of informal knowl-
edge (it's not in the picture). The collection of these metadata is well described
so it is machine readable. This allows delivering to Didaskon relevant portions of
learning material, basing on what it needs during the composition. The description
of learning material is formed in a common way, according to the LOM standard,
which is popular with eLearning frameworks.
4.2.2 System requirements

Before this stage, I had dened both functional and non-functional requirements.
They gave me a view on what and how the developed system should act.
Requirement description template
Dening the system requirements is very important. They should be organized in a
readable and correct way. Table 4.1 is a template for describing the requirements.
50
Figure 4.1: System scope
In the table, there is the following information:
• Id unique identier of the requirement used further within the documenta-
tion. It consists of:
X F for functional; N for non-functional requirements
YY the number of the F or N requirement, starting from 01
• Priority determines the importance of this requisite; possible values from
most to less important: crucial, required, optional
• Title the described aspect of the system
• Description the essence of the requirement
51
Table 4.1: Requirement description template
Id XYY Priority
Title
Description
Source
Related req.
• Source stakeholder(s) whose/which knowledge and needs constitute on the
requirement
• Related req. possible requirements the one described relates to.
Functional requirements
Functional requirements cover the stakeholders' demands on what the developed
system should do. Precisely described stakeholders' needs are a rst step to nish a
project with success.
Id F01 Priority Crucial
Title Deliver a list of all informal LOs
Description IKHarvester should be able to provide a list all informal LOs stored in
the informal knowledge repository.
Source Didaskon
Related req. F07, F09
52
Id F02 Priority Optional
Title Deliver a list of informal LOs that have changed since a given date
Description IKHarvester shall provide information on which informal LOs have
changed since a given date. This aims at avoiding the situation, where
data which is not up-to-date is used.
Source Didaskon
Related req. F01, F07, F09
Title Deliver the manifest of a specic informal LO
Description It is one of the basic features of IKHarvester. Metadata for SSIS resources
is stored in the informal knowledge repository and must be accessible
by agents. However, it must be described in a common object model.
Because of eLearning background of the system, the metadata must be
presented in Learning Object Manifest.
Source Didaskon
Related req. F01, F02, F07, F09
Title Deliver the content of a specic LO
Description During a course composition, Didaskon uses LO manifests. Finally, it
creates a curriculum from the content of relevant LOs.
The content of informal LOs must not be stored in the repository; it must
be collected on the y. This way, only descriptive information is kept in
the informal knowledge repository. It is useless to store the content of
SSIS resources; it can change quite often so it may be dicult to keep it
up to date. Also, the amount of data stored would be too large.
Source Didaskon
53
Title Add a LO to the informal knowledge repository
Description Informal Learning Objects will be added either by students (Didaskon
users) or by the administrator. Regarding SSIS, one must know the URL
of the resource from which a LO will be created.
The data should be harvested from:
• blog posts that have support for SIOC
• (semantic) wiki articles, based on MediaWiki engine
• JeromeDL
Source Didaskon
Related req. F01, F02, F07, F09, F10, F11, F12
Title Remove a LO from the informal knowledge repository
Description If a SSIS resource from which the LO was created no longer exists, it shall
be removed from the informal knowledge repository. The data should not
be physically removed though. Instead, in the repository, there should
be added some information about removal.
Source Didaskon
54
Title Access to all functionalities by Web Services
Description IKHarvester shall be an extension to Didaskon, build as a SOA layer.
Using SOA assures eciency and easy access to the system features. All
methods should be accessible by REST type Web Services.
Source Didaskon
Related req. F01, F02, F03, F04, F05, N04
Id F08 Priority Required
Title Testing background
Description Since IKHarvester should support SOA architecture, a user interface is no
longer needed (especially for LMSs purposes). However, the functionality
should be tested to assure the system is reliable. Also, there should be
provided an access for the administrator of the repository. Thus, some
testing pages must be prepared.
Source Didaskon, Jarosªaw Dobrza«ski
Related req. F071, N01, N04
Title The structure of Learning Objects
Description IKHarvester must provide informal LOs description in a common ob-
ject model suitable for eLearning. The structure of the model must be
compliant with SCORM Content Aggregation Model.
Source Didaskon
Related req.
55
Title Harvest data from semantic blogs
Description IKHarvester must be able to collect data from semantic blogs which are
supported with SIOC plug-in. The data should be obtained by using the
SIOC exporter.
Source Didaskon
Related req. F05
Title Harvest data from (semantic) wikis
Description IKHarvester must be able to collect data from both semantic and non-
semantic wikis, based on MediaWiki engine.
IKHarvester should use RDF feeds that provide semantic annotations.
Besides, it must perform articles' pages scraping to collect more data.
Source Didaskon
Related req. F05
Title Harvest data from JeromeDL
Description IKHarvester must be able to collect data from JeromeDL, the Social Se-
mantic Digital Library. The data shall be produced by the RDF exporter.
Source Didaskon
Related req. F05
Title Filter the collected metadata
Description In case RDF extractors supplies IKHarvester with not relevant data, it
must be ltered.
Source Didaskon
Related req. F05
56
Non-functional requirements
Non-functional requirements predetermine expectations regarding the system, but
not concern its interaction with the environment.
Id N01 Priority Required
Title Reliability
Description IKHarvester will be a subsystem working with LMS(s), in particular with
Didaskon. It is necessary to provide reliable responses. SSIS resources
must be precisely described; only then Didaskon can perform proper
reasoning on them.
Related req. N06, N07, N08
Title Interoperability
Description IKHarvester shall be able to exchange and use information in heteroge-
neous networks. Yet, it is supposed to collect data from SSIS which carry
diverse information.
Source Jarosªaw Dobrza«ski
Related req. N05
57
Title Extensibility
Description The system should be developed in a way that allows making im-
provements; it must be possible to make corrections, improvements and
changes to the existing and working system.
Also, it must be easily extended with plug-ins (see Fig. 4.6) that deal
with other types of SSIS (wiki based on dierent engine, other digital
libraries, fora, bookmarks sharing systems, etc.)
Source Didaskon
Related req. N10, N11
Title Eciency
Description The system must provide relevant information quickly. Therefore, it
should be developed as a lightweight application, for example as Web
Services.
However, this is not crucial; the communication takes place within the
Internet so there might happen some periods of time the services are
dead. It cannot happen often, though.
Related req.
Title Portability
Description It is required that the system is platform independent; It should be de-
ployed on either UNIX or Microsoft Windows operating system without
changes to the source code. System should be delivered in a way that
allows quick deployment.
Source Didaskon
Related req. N02
58
Title Stability
Description IKHarvester should be stable; it should work ne in the environment it
is deployed. Unexpected breaks and falls must be predicted and avoided.
Source Didaskon
Related req. N01
Title Safety
Description Stored information shall be protected; it cannot be lost or modied ac-
cidentally.
Id N08 Priority Optional
Title Security
Description IKHarvester shall be protected from intentional and unintentional ac-
tivities and eorts that aim at lowering its eciency and the quality of
work.
Related req. N01, N04, N07
Id N09 Priority Optional
Title Open software
Description During the development, only open source software and tools should be
used (does not apply for operating system)
Source Didaskon
Related req.
59
Id N10 Priority Crucial
Title Version Control
Description All the documents and other products (like software) created during the
project will have a version number which will allow to track changes in
an easy way. There is need for a tool like SVN for version controlling.
Source Didaskon
Id N11 Priority Crucial
Title Integrated development environment
Description There is a need for an integrated development environment for more
ecient project managing and better support for version control.
Source Didaskon
Related req. N10
4.2.3 System Use Cases

As stated previously (see Sec. 4.2.2), IKHarvester is built for capturing informal
learning. There are two main functions provided:
• providing data stored in the informal knowledge repository
• collecting data from SSIS and storing it in the informal knowledge repository
More detailed use cases are depicted in Fig. 4.2; there are basic functionalities
and the actor shown.
Actors
Below, there is a description of the actor that uses functionalities provided by IKHar-
vester.
60
Figure 4.2: Use Case diagram
61
Id A01
Title Client
Description IKHarvester will be build as an SOA layer. Initially it was
supposed to be an extension for Didaskon, an eLearning 2.0
LMS. However, since all its features are accessible through
Web Services, we expect more than one actors that can use
it.
Related actors
Use cases
A use case is an occurrence that takes place while the system works. Each use case
is initiated either by the actor's activity or by another use case. It is very important
to provide a use case scenario that was created after system requirements analysis.
Use cases tell more precisely about what can happen to the system while it works.
Id UC01
Title Delivery of a list of LOs
Description Providing a list of existing Learning Objects created from
informal knowledge stored in the repository.
Actors A01
Initial occurrence Claim for list of LOs.
Exceptional occurrence Unsuccessful connection to the repository
Related use cases UC07, UC17
62
Id UC02
Title Delivery of the LO manifest
Description Providing a description of an informal LO, taken from repos-
itory. The LO is described according to LOM standard.
Actors A01
Initial occurrence Claim for the manifest of an informal LO.
Related use cases UC07, UC16, UC17
Id UC03
Title Delivery of the LO content
Description Providing the content of a LO. The content can be txt, HTML
or a link to a digital resource. This content will be used in
the composed course.
Actors A01
Initial occurrence Claim for the content of the LO.
Exceptional occurrence Unsuccessful connection to the resource with given URL; ei-
ther problems with the Internet connection or the resource is
no longer online.
63
Id UC04
Title Adding a new LO
Description Didaskon's user or the system administrator is allowed to add
new LOs. The actor must hold the URL of the resource which
should be added.
Actors A01
Initial occurrence Claim for adding a new LO.
Exceptional occurrence Unsuccessful connection to the repository or problems with
the Internet connection (impossible to harvest metadata).
Id UC05
Title Update to the LO manifest
Description If the metadata for resources, which are regarded as sources
of informal knowledge changes, it should be updated. Only
then Didaskon can perform ecient and sucient reasoning.
Actors A01
Initial occurrence Claim for updating metadata of resource that has changed.
64
Id UC06
Title Removal of the LO manifest
Description If a resource regarded as informal knowledge no longer exists,
it should be removed from the repository. Then, Didaskon
will not use it during course composition.
Actors A01
Initial occurrence Claim for the LO manifest removal.
Related use cases UC07
Id UC07
Title Specifying the input data
Description Specifying the input data consists of qualifying preconditions
or other information (for instance resource's URL) needed for
data harvesting and providing.
Actors A01
Initial occurrence Claim for data harvesting and providing.
Exceptional occurrence
Related use cases UC01, UC02, UC03, UC04, UC05, UC06, UC08, UC09
Id UC08
Title Specifying the adding date
Description Assign a value to resource's the adding date.
Actors A01
Initial occurrence Claim for a list of all LOs. If adding date is specied, the list
will contain only those LOs which are added since then.
65
Id UC09
Title Specifying the URL
Description Assigning a value to resource's URL.
Actors A01
Initial occurrence Claim for a specic LO.
Id UC10
Title Harvesting data from SSIS
Description For a given resource's URI metadata harvesting procedure is
performed.
Actors A01
Initial occurrence Claim for LO content or adding or updating metadata of spe-
cic LO.
Exceptional occurrence Problems with the Internet connection.
Related use cases UC03, UC04, UC05, UC11, UC12, UC15
Id UC11
Title Using RDF extractor
Description Semantic web pages allow extracting metadata for their re-
sources by using special RDF extractors. IKHarvester calls it
in order to be given the metadata.
Actors A01
Initial occurrence Harvesting data from SSIS
Exceptional occurrence Problems with the Internet connection or diculty in resolv-
ing RDF extractor URL address.
66
Id UC12
Title Scraping HTML code
Description Reading HTML code of the resource's web page in order to
nd more relevant metadata.
Actors A01
Initial occurrence Harvesting data from (semantic) wikis
Exceptional occurrence Problems with the Internet connection.
Id UC13
Title Transformation information to RDF
Description If a web page scraping is performed, collected data must be
transformed to RDF so that it can be stored in the semantic
informal knowledge repository.
Actors A01
Initial occurrence Scraping (semantic) wikis article web page
Id UC14
Title Filtering metadata
Description Some information provided by RDF extractors can be no rel-
evant for Didaskon. Thus, the triples must be ltered; only
the crucial information will be saved to the repository.
Actors A01
Initial occurrence Using RDF extractor.
67
Id UC15
Title Saving data
Description Saving triples created during SSIS harvesting to the informal
knowledge repository.
Actors A01
Initial occurrence Using RDF extractor.
Id UC16
Title Transformation to LOM model
Description Triples collected from the informal knowledge repository must
be delivered to Didaskon in a common model. Because of
learning purposes, the model is LOM.
Actors A01
Initial occurrence Claim for LO manifest
Id UC17
Title Selection from the repository
Description Metadata which will be delivered to Didaskon must be col-
lected from the informal knowledge repository.
Actors A01
Initial occurrence Claim for LO manifest or Lo list
68
4.3 System design
By now, I have pointed out and described the system requirements. Also, I dened
possible use cases. In this Section, I describe the architecture of IKHarvester. I
report more precisely how it works, give some details on what is going on inside the
system.
4.3.1 Service-Oriented Architecture

According to He [15], SOA is an architectural style that aims at loose coupling
among interacting software agents. There is a number of services that do a unit of
work to fulll the service consumer's needs. The services are independent; they do
not rely on the context and state of other services. The architecture demands using
interfaces based on the Internet protocols like HTTP, FTP, SMTP; all messages,
except from binary data attachments, must be described in XML. There are two
main Web Services types: SOAP and REST.
SOAP
SOAP (Simple Object Access Protocol) Web Services are very popular nowadays.
SOAP is a protocol for transferring data between the source and the destination
through potential intermediate nodes. It forces developers to describe services in
WSDL (Web Service Description Language). Having a WSDL, it is easy to create
the core (stubs) of the client code which can call SOAP Web Services. Messages
sent with SOAP are wrapped by an envelope; within it, there is the content (body)
and some additional information.
REST
REST (REpresenational State Browser) Web Services are based on the concept of a
resource anything that is characterized with a URI (Uniform Resource Identier).
In fact, it is used commonly nowadays, in the World Wide Web and Web 2.0 [10, 45].
REST interfaces provide representation of a resource in XML. There are four
possible HTTP methods:
69
• GET for obtaining a stateless representation of a resource
• POST for updating or creating a representation of a resource
• PUT for creating a representation of a resource
• DELETE for removing a representation of a resource
Employing SOA
I have decided to implement IKHarvester as a SOA layer for Didaskon. I have
developed a group of Web Services. Thus, IKHarvester is independent from the LMS,
which introduces better scalability, eciency, extensibility and interoperability. This
fullls a number of non-functional requirements.
Although, both SOAP and REST have pros and cons, I have used REST since
it is more suitable for the Semantic Web solution [14], as it is resource-oriented.
4.3.2 System components

In this Section, I describe details of IKHarvester's architecture. The component
diagram (see Fig 4.3) depicts a high level architecture of the system.
Borders of responsibilities of respective components are as follows:
IKHarvester The core of the system. It is responsible for integrating its two
subcomponents:
• Harvester performs informal knowledge harvesting from SSIS
• Provider delivers informal knowledge stored in the repository in a
form compatible with LOM standard
Jericho HTML Parser5 is a Java library for web pages scraping. It allows anal-
ysis and manipulation of parts of HTML documents, including some com-
mon server-side tags, while reproducing verbatim any unrecognized or invalid
HTML. It is freeware, under GNU Library or Lesser General Public License
(LGPL).
70
Figure 4.3: Component diagram
Didaskon LOM a component developed by the Didaskon team. Its goal is to
manage data compatible with LOM standard; it allows creating and exporting
to XML format Learning Objects Manifest which suciently describes learning
material.
Didaskon DB a component developed by the Didaskon team. It provides the
71
interface for connection with RDF storages. Consequently, informal knowledge
can be saved to and retrieved from the repository.
Informal Knowledge Repository RDF storage; it is Sesame repository. It con-
tains triples describing the learning material.
4.3.3 Classes
Fig. 4.4 and Fig. 4.5 present the simplied class diagram for the IKHarvester system.
It covers a number of classes with their most important attributes and methods. The
classes are organized in a few packages.
On the class diagram, there are following classes:
DataHarvester an interface dening three methods for harvesting: harvest-
Content(), harvestMetadata(), and removeResource().
DataHarvesterImpl that implements DataHarvester. This is the superclass to
those that are harvest data from resources of dierent types. In current ver-
sion of IKHarvester, it has three subclasses: WordPressDataHarvester, Blogger-

DataHarvester, MediaWikiDataHarvester, and JeromeDLDataHarvester.
WordPressDataHarvester is used for harvesting informal knowledge from blog
posts that use WordPress engine. Current version of IKHarvester tracks only
those WordPress blogs that support SIOC.
BloggerDataHarvester is used for harvesting informal knowledge from blogs
hosted on Blogger.
MediaWikiDataHarvester is used for harvesting informal knowledge from arti-
cles hosted on wikis that use MediaWiki engine.
JeromeDLDataHarvester is used for harvesting informal knowledge from
JeromeDL resources.
HarvestingResults enum class that dene how harvesting ends (for instance,
with success, with failure)
72
MediaWikiScraper used for scraping web pages with wiki articles in order
to nd crucial metadata. Its methods employ Jericho HTML Parser for that
purposes.
BloggerScraper used for scraping blog posts hosted on Blogger in order to nd
crucial metadata. Its methods employ Jericho HTML Parser for that purposes.
DataProvider an interface that dene two methods for providing data stored
in the informal knowledge repository: getLOManifest(), and getLOContent().
DataProviderImpl implements the above mentioned interface and calls its sub-
classes responsible for providing data that has been collected from dierent
type of resources. Also, it delivers methods for obtaining the list of learning
objects stored in the informal knowledge repository.
BlogPostDataProvider provides methods for retrieving from the informal
knowledge repository metadata for posts and providing them to eLearning
frameworks.
WikiArticleDataProvider provides methods for retrieving from the informal
knowledge repository metadata for wiki articles and providing them to eLearn-
ing frameworks.
DLResourceDataProvider provides methods for retrieving from the informal
knowledge repository metadata for digital libraries resources and providing
them to eLearning frameworks.
NS is a set of a few classes that dene namespaces for ontologies used for describing
blog posts, wiki articles and JeromeDL resources: NOTITIOUS, FOAF, XFOAF,
MarcOnt, XMarcOnt, JeromeDL, and SIOC.
NSDL denes predicates used for digital libraries resources
NSBlog denes predicates used for blog posts
NSWiki denes predicates used for wiki articles
RDFQuery a helper class containing a set of SeRQL queries
73
Util contains a set of helper methods
Constant denes constants used in IKHarvester classes
WikiArticleJBean represents a wiki article
BlogPostJBean represents a blog post
BloggerPostHTMLJBean represents an HTML snippet with information
about a blog post
LOJBean represents a LO that is returned in a collection of LOs
FieldValueWrapper a helper wrapper
FieldValueType an enum that denes dierent types of HTTP request param-
eters
ContextKeeper a helper class to access webapp context information
FieldValueWrapperMap wraps a map constructed when processing a request
query
4.3.4 Extending IKHarvester

I have designed the IKHarvester system in a way that allows programmers to create
new blades modules for managing other types of resources (see Fig. 4.6). To do
so, a programmer must learn the class diagram (see Fig. 4.4 and Fig. 4.5).
Adding new data harvesters
Current version of IKHarvester captures metadata from SSIS with the following
classes:
• WordPressDataHarvester
74
Figure 4.4: Class diagram (part #1)
75
Figure 4.5: Class diagram (part #2)
76
• BlogerDataHarvester
• MediaWikiDataHarvester
• JeromeDLDataHarvester
Each of the above mentioned classes works with a specic type of SSIS. It is im-
portant wheter it captures information from, for example, a post hosted on Blogger
or one that runs on WordPress engine because data is exposed dierently. Conse-
qently, to provide support for example for a new type of blog posts or wiki articles,
a programmer must write a new class that extends DataHarvesterImpl.
Adding new data providers
Curently, there are three classes for retrieving metada for captured resources from
the informal knowledge repository and providing them to eLearning frameworks:
• BlogPostDataProvider
• WikiArticleDataProvider
• DLResourceDataProvider
All the above mentioned classes extend DataProviderImpl that implements the
DataProvider interface. Those three classes support three types of SSIS: blogs, wikis
and digital libraries.
It is assumed that metadata for resources of each of those types is dened in a
common object model, disregarding the fact whether it is, for instance, a post from
Blogger or WordPress, wiki article from MediaWiki or IkeWiki. As a result, extend-
ing IKHarvester with a module that captures data from another type of blogs, wikis
and digital libraries does not require implementation of a new providing module.
However, if such new module is required, a new class that extends the DataProvider-
Impl must be added.
4.3.5 Attribute mapping rules

IKHarvester aims at capturing informal learning. Data harvesting means collecting
data from SSIS (in general online communities) and saving them to the infor-
77
Figure 4.6: Blades for dierent SSIS types
mal knowledge repository. The repository stores these metadata in RDF triples
from which Learning Objects described according to LOM standard are created and
delivered to Didaskon.
Dening mapping rules for resources' attributes, their semantic representations
(predicates), and LOM attributes was crucial for further development. There are
plenty of properties that describe a resource. Semantic RDF feeds are very helpful
since they provide mapping from attributes to predicates. For they give a lot of
unnecessary information (from learning perspective ), their output must be ltered
during LO's manifest composition.
In this Section, I describe the attributes mappings for each resource type IKHar-
vester supports at the moment (blog posts, wiki articles, and JeromeDL resources).
Blog Posts
Metadata for blog posts is delivered by SIOC data exporters. A blog that supports
SIOC, contains some additional information in the meta tag (inside head tag) in the
HTML code. For my blog, which is available at http://dobrzanski.net, it looks
78
as follows:
Listing 4.1: Support for SIOC information
<l i n k r e l ="meta " t y p e =" a p p l i c a t i o n / r d f+xml " t i t l e ="SIOC"
h r e f =" h t t p : / / d o b r z a n s k i . n e t / i n d e x . php ? s i o c _ t y p e= s i t e " />
The href attribute value is the URL of the RDF representation of the data on
current page. Its value changes during browsing the blog; it is always up to date,
ready to produce RDF output. In general, the output consists of some information
about the blog itself and its posts.
Having the URL of SIOC data for a post, IKHarvester uses the exporter to obtain
the RDF graph which is saved to the informal knowledge repository.
When it is asked to deliver data, it collects the RDF statements from the repos-
itory and transform them so they describe the post in a way compatible with LOM
standard. Since some of the metadata is not crucial for eLearning purposes, it is
ltered during creating LO manifest.
In the following Table, I present how post attributes (rst column) are mapped
to SIOC ontology predicates (second column) and then to LOM attributes (third
column). Some of the LOM attributes are set to default values, which cannot be
collected from SIOC exporter output. Attributes labeled with an asterisk (*) can
occur more than once.
Table 4.2: Mapping: posts attribute - semantic description - LOM.
Attribute Predicate LOM
- sioc:Post Educational.LearningResourceType=BlogPost
URI - Technical.Location &
General.Identier.Catalog=URI &
General.Identier.Entry &
Meta-Metadata.Identier.Catalog=URI &
Meta-Metadata.Identier.Entry
title dc:title General.Identier.Title
79
creator sioc:has_creator Lifecycle.Contribute.Role=Author &
Lifecycle.Contribute.Entity=Personal info. &
Lifecycle.Contribute.Date=Date of creation &
Meta-Metadata.Contribute.Role=Author &
Meta-Metadata.Contribute.Entity=Personal info. &
Meta-Metadata.Contribute.Date=Date
creation date dcterms:link Lifecycle.version=Date
description SIOC:content General.Description &
Educational.Description &
Classication.Description
rich content (HTML) content:encoded -
topic* sioc:topic General.Keyword &
Classication.Keyword
reply* sioc:has_reply Annotation.Entity=About author &
Annotation.Date=Date &
Annotation.Description=Content
external link* sioc:links_to Relation.Kind=references &
Relation.Resource.Identier.Catalog=URI &
Relation.Resource.Identier.Entry &
Relation.Resource.Description=references
language - General.Language &
Educational.Language &
Meta-Metadata.Language
- - Educational.InteractivityType=expositive
- - Educational.InteractivityLevel=medium
- - Educational.SemanticDensity=medium
- - Educational.IntendedEndUserRole=learner
- - Educational.Context=school &
Educational.Context=higher education &
Educational.Context=training &
Educational.Context=other
- - Educational.Diculty=easy
- - Rights.Cost=no
- - Rights.CopyrightAndOtherRestrictions=
no
- - General.Structure=atomic
80
- - General.AggregationLevel=1
- - MetaMetadata.MetadataSchema=LOMv1.0
- - Technical.Requirement.OrComposite. . .
.Type=operating system
.Name=multi-os
.Type=browser
.Name=any
- - LifeCycle.Status=revised
Wiki Articles
IKHarvester must collect data from semantic and non-semantic wikis which are
based on MediaWiki engine. Many information (relations and attributes) about
the concept described in an article from a semantic wiki can be obtained by using
RDF feed. However, harvesting should be performed also for non-semantic wikis, like
Wikipedia. It turns out there is quite a lot of semantics in the HTML code; dierent
sections like titles, content and categories are put inside sections with formalized
identiers. Thus, scraping the page results in a lot of crucial information. In fact, I
perform scraping for both semantic and non-semantic wikis.
In the following Table, I present the way of mapping the attributes of wiki
articles (rst column) to SIOC ontology predicates (second column) and then to
LOM attributes (third column). Some of the LOM attributes are set to default
values suggesting on LOM standard proposes. Attributes labeled with an asterisk
(*) can occur more than one time; those with two asterisks (**) are served by RDF
feeds; they can be multiple as well.
Table 4.3: Mapping: wiki article - semantic description - LOM.
- sioc:WikiArticle Educational.LearningResourceType=
WikiArticle
81
title dc:title General.Identier.Title
last. modif. date dctermss:link Lifecycle.version=Date
description SIOC:content General.Description &
rich content (HTML) content:encoded -
category* sioc:topic General.Keyword &
external link* sioc:links_to Relation.Kind=references &
Relation.Resource.Description=references
relation** relation:xxx Relation.Kind=xxx &
Relation.Resource.Description=xxx
attribute** attribute:xxx Relation.Kind=has attribute &
Relation.Resource.Description=has attribute
82
- - Educational.Diculty=medium
- - Rights.Cost=no
- - Rights.CopyrightAndOtherRestrictions=
no
.Name=multi-os
.Type=browser
.Name=any
- - LifeCycle.Status=revised
JeromeDL resources
JeromeDL provides extract information for resources in a few forms (see Sec. 3.1.3).
I have chosen MarcOnt ontology supported by JeromeDL ontology. The mapping
rules for JeromeDL resources' attributes are presented in the following table.
Table 4.4: Mapping: JeromeDL resource - semantic description -
LOM.
- jeromedl:Book Educational.LearningResourceType=
JeromeDLResource
title marcont:hasTitles General.Identier.Title
83
creator marcont:hasCreator Lifecycle.Contribute.Role=Author &
Lifecycle.Contribute.Date=Date of creation &
Meta-Metadata.Contribute.Role=Author &
Meta-Metadata.Contribute.Entity=Personal info. &
Meta-Metadata.Contribute.Date=Date
abstract jeromedl:abstract General.Description &
keyword* marcont:hasKeyword General.Keyword &
bookType jeromedl:bookType Educational.LearningResourceType
digitalType jeromedl:digitalType Technical.Format
protectionType jeromedl:protectionType Rights.Copyright=XXX&
Rights.Cost
supervisor xmarcont:supervisor Lifecycle.Contribute.Role=Supervisor &
Meta-Metadata.Contribute.Role=Supervisor &
Meta-Metadata.Contribute.Entity=Personal info.
consultant xmarcont:consultant Lifecycle.Contribute.Role=Consultant &
Meta-Metadata.Contribute.Role=Consultant &
uploader jeromedl:uploader Lifecycle.Contribute.Role=Uploader &
Meta-Metadata.Contribute.Role=Uploader &
84
- - Educational.Diculty=medium
.Name=multi-os
.Type=browser
.Name=any
- - LifeCycle.Status
85
Chapter 5
System implementation
In this chapter, I describe the software development methodology I followed during
development of the IKHarvester system. Then, I present tools I used while writing
this Thesis. I give a brief description of software that helped me in both, writing
this paper and developing IKHarvester.
5.1 Implementation methodology

IKHarvester has been built according to the waterfall software development model.
This model demands that an application is build by following sequentially a few
specied steps:
Requirements is the initial stage of the system development. This is the time of
collecting information on what the system should.
Design Having the requirements, the designers create the architecture of the system
and try to explain how it will fulll the demands.
Implementation Basing on the prepared architecture, the programmers imple-
ment the system. The result of this stage is a working product.
Testing (validation) When the system works, it is time to validate it, check
whether it does what it should and how it should.
Integration After improving bugs and deciencies discovered during the testing
phase, the system can be deployed in the determined environment.
86
Maintenance This is the stage when the system is deployed and works in the
determined environment. Although it is the last stage, a lot eort must be
put into maintenance. Often, this is the time, when some uncovered errors
occur. Also, the application can be still improved; new features can be added
as well.
5.2 Three-tier architecture

Although IKHarvester is a SOA layer, it can also present responses in web browsers.
Thus, there is supposed to be the presentation layer as well.
Figure 5.1: Three-tier architecture
All in all, IKHarvester has three-tier architecture, so there are:
Presentation Tier Visualizes responses to the client in a web browser.
Logic Tier Contains the logic of the application. Basing on the input arguments
provided by a client, it performs calculations, also on the data from the
database.
87
Data Tier Handles the connection and queries to the database in order to get and
save data in the storage.
Each tier is related to the dierent aspect of the application (presentation, logic
and data). The general idea says, that the Logic Tier is the middleware used by the
presentation layer (user's actions) in order to operate on the stored data.
5.3 IKHarvester main page

IKHarvester can be used by a software agent due to the exposed Web Services and by
a user with a web browser. The second approach introduces usage of web pages. In
the picture below (see Fig. 5.2) you can see the main web page with the menu and a
form for adding metadata for online resources to the informal knowledge repository.
Figure 5.2: IKHarvester main page
A user can get metadata for informal Learning Objects (see Sec. B.1), get the
list of informal Learning Objects (see Sec. B.3), get the information and support for
facilitating usage of IKHarvester with web browsers, and learn about the system.
88
5.4 Environment and necessary tools
5.4.1 Implementation environment
Java Platform
Java Platform is an environment dedicated to programmers who develop applica-
tions using Java programming language, which is supposed to be write once, run
anywhere. It was created and managed by Sun Microsystems. Java Platform con-
sists of great many technologies. It has an execution machine which is called Java
Virtual Machine (JVM).
As aforementioned, there are a lot of technologies which Java Platform consists
of. I will shortly describe two of them, which I decided to use in the system I will
create.
Java Standard Edition

1
Java SE is a collection of Java programming APIs that are broadly used in many
2
Java platform programs. With reference to Sun , Java SE allows to develop and
deploy applications on desktops and servers. Java SE also includes classes that
support development of Java Web Services, and provides the foundation for Java
Platform, Enterprise Edition (Java EE).
The core products in Java SE family are:
• Java Runtime Environment (JRE) provides Java APIs, Java Virtual Ma-
chine, and some more components required to develop applications and applets
in Java programming language
• Java Development Kit (JDK) encapsulates JRE and useful tools for devel-
opers (compilers, debuggers)
1 http://java.sun.com/j2se/
2 http://www.sun.com/
89
Java Enterprise Edition
3
Java EE provides more classes than Java SE. They are dedicated to programs
running on rather on servers than on workstations. That applications have multi-
tier architecture, based mainly on the modular software running on servers.
Java EE is considered as a standard as providers must agree to certain con-
formance requirements in order to declare their products as Java EE compliant.
However, it is not a formal standard.
Web server Apache Tomcat

4
Tomcat is a part of an open-source Apache Jakarta project. It is a platform in-
dependent servlet container, used in the ocial Reference Implementation for the
5 6
Java Servlet and Java Server Pages technologies.
RDF data storage Sesame

7
Sesame is an open-source RDF database with support for RDF Schema referencing
and querying. It is being developed as a part of the On-To-Knowledge project.
Sesame's benets are: good scalability, high query performance and support for
several RDF query languages including SeRQL and RQL.
IDE Eclipse
8
Eclipse is one of the most popular Integrated Development Environment. It is an
open-source platform-independent software framework. It is developed, evaluated
and promoted by the Eclipse Foundation along with the community.
The platform has been designed to be plug-in-able. Its power and abilities can
be extended by downloading and installing extensions called plug-ins.
3 http://java.sun.com/j2ee/
4 http://jacarta.apache.org/tomcat/
5 http://java.sun.com/products/servlets
6 http://java.sun.com/products/jsp
7 http://openrdf.org/
8 http://eclipse.org
90
Building the project Apache Ant
9
Apache Ant is an open-source build tool that was build on Java programming
language.
It is kind of Make. However, it is simpler to use. It automates tasks like com-
piling, building, deploying Java projects les by a proper conguration stored in
XML-based les.
Testing/logging log4j
I have put a lot eort in testing during the development. I have tried to create new
components and test them at once, so the risk of bugs was limited. All in all, I have
10
used log4j , the logging mechanism, with two levels of logs: errors and information.
It is easy to switch between these logging levels.
Logs are saved to a special le. Each occurrence of an error is rich described
there is some information about the error itself, the reason why it occurred and the
place in the source code where it occurred.
Support for group work Subversion

11
Subversion (SVN) is a revision control system which facilitates applications de-
velopment by a distributed group of programmers. It is designed to be a modern
12
replacement for Current Versions Systems (CVS) . It has a number of features:
atomic commits, versioning of symbolic links, native support for binary les, full
MIME support, etc.
5.4.2 Documentation
Thesis environment LATEX
AT X 13
L E is a document markup language and document preparation system for
TEX typesetting program.
9 http://ant.apache.org
10 http://logging.apache.org/log4j/
11 http://subversion.tigris.org/
12 http://www.nongnu.org/cvs/
13 http://www.latex-project.org/
91
It allows an author to focus on the content and meaning of the document he/she
writes instead of how it looks; the visual presentation is dened by using styles. Since
one species the logical structure of the document (chapters, sections, paragraphs,
etc.), he/she can easily change the way it looks, by using another style.
LATEX editor TeXlipse

14 AT X support for Eclipse IDE (see Sec. 5.4.1). It
TeXlipse is a plug-in that adds L E
facilitates writing TEX documents by highlighting the syntax, outlining the docu-
ments, code folding, providing BibTeX and table editor.
UML diagrams JUDE Community
All the UML diagrams used in this document have been created in a free (Commu-
15
nity) version of JUDE (Java and UML Developers' Environment).
JUDE Community supports a number of UML 1.4 diagrams: Class (Object/-
Package/Robustness), Use Case, Collaboration, Statechart, Activity, etc. Moreover,
it can generate templates and include of Java source les, automatically generate
Class diagrams with model information, and more.
Figures Inkscape
16 17
Inkscape is an open source editor for creating vector graphics using W3C stan-
18
dard Scalable Vector Graphics (SVG) . It is designed to fully support XML, SVG,
and CSS standards.

14 http://texlipse.sourceforge.net/
15 http://jude.change-vision.com/jude-web/product/community.html
16 http://www.inkscape.org/
17 http://www.w3.org/
18 http://w3.org/Graphics/SVG/
92
5.5 Main problems and solution details
5.5.1 Implementation of REST
As stated before (see Sec. 4.3.1), IKHarvester is a SOA layer for Didaskon. Then,
Didaskon is a client that uses Web Services provided by my system. All available
Web Services must be specied, so that the relevant client code can be implemented.
My REST implementation employs the Java Servlet technology. I have imple-
mented the RESTRequestDispatcherServlet class the servlet that handles requests
from the client and generates responses.
All requests handled by IKHarvester have URIs according to the following tem-
plate:
• server the server on which IKHarvester is hosted, for example:
http://notitio.us/ikh/
• query string denes the action to be performed. It is build in two ways:
URI the URI of the LO to be used (it is put between $ characters). It
can be followed by the manifest or content keywords

type if a client wants to obtain a list of LOs, the type must be specied
Below, there are denitions of all request that can be sent to IKHarvester. The
following tables dene usage from the client's point of view. When some features of
IKHarvester are invoked from a web page (provided along with the system), a query
string mode=admin is used in order to present some output on the page.
Get the list of available Learning Objects
This HTTP request is used for obtaining all the LOs that can be created from
informal knowledge stored in the repository.
Denitions:
server the server on which IKHarvester is hosted
type if not set, the Web Service delivers a list of LOs of all types; if set ( BlogPost,
MediaWiki, JeromeDL), only resources of that type are returned
93
Table 5.1: REST get LO list
URL http://[server]/ikh/soa/[type]
Method GET
Returns All available LOs or LOs of the specied type.
Content type text/xml
Examples:
http://notitio.us/ikh/soa
http://notitio.us/ikh/soa/BlogPost
Get the manifest of the Learning Object
This HTTP request is used for retrieving from the informal knowledge repository
the manifest of the specic LO in a an XML form, compatible with LOM standard.
Table 5.2: REST get LOM
URL http://[server]/ikh/soa/$URI$manifest
Method GET
Returns LO manifest compatible with LOM standard
Denitions:
URI the URI of the resource
Examples:
http://notitio.us/ikh/soa/$http://dobrzanski.net/2007/03/15/pandora/$manifest
Get the content of the Learning Object
This HTTP request is used for obtaining the content of the specic LO in a an XML
form. The content is collected on the y it is not stored in the repository.
Denitions:
94
Table 5.3: REST get LO content
URL http://[server]/ikh/soa/$URI$content
Method GET
Returns LO content
Examples:
http://notitio.us/ikh/soa/$http://dobrzanski.net/2007/03/15/pandora/$content
Add Learning Object
This HTTP request is used for adding and updating an informal LO to the repos-
itory. All crucial metadata, except from the actual content, for the given resource
are saved as triples describing it.
Table 5.4: REST add LO
URL http://[server]/ikh/soa/$URI$
Method PUT
Returns
Content type
Denitions:
Examples:
http://notitio.us/ikh/soa/$http://dobrzanski.net/2007/03/15/pandora/$
Remove Learning Object
This HTTP request is used for removing an informal LO from the repository. It is
should be invoked if the resource is no longer available in the Internet.
95
The resource is not physically removed from the repository. Instead, a triple
informing about the removal is added to the repository. This is forced because of
the synchronizing problems, when more than one LMS uses IKHarvester.
Table 5.5: REST remove LO
URL http://[server]/ikh/soa/$URI$
Method DELETE
Returns
Content type
Denitions:
Examples:
http://notitio.us/ikh/soa/$http://dobrzanski.net/2007/03/15/pandora/$
5.5.2 Invoking the data tier features

IKHarvester operates on data that are stored in the Sesame repository in a form
of RDF triples. The connection to the storage and realization of the queries are
handled in the SesameDBFace class the only class in the Didaskon DB module
that has been prepared for that reason.
Retrieving the connection to the data storage
The SesameDBFace class has been implemented according to the singleton pat-
tern which allows only one instance of a class. Thus, there is a private construc-
tor that can be used only from within the getInstance(. . . ) method, which is in-
voked by the logic tier of the application. It is worth noticing that the Didaskon
DB module can be used by any application. For that reason, there is the cache
(SESAME_DB_FACE_CACHE) of instantiated objects of that class.

To learn how the connection to the data storage is created and managed, see
List. 5.1.
96
Listing 5.1: Retrieving the connection to the data storage
/ ∗∗
∗ cache of instances of dbfaces
∗/
private static Map<S t r i n g , S o f t R e f e r e n c e <SesameDBFace>>
SESAME_DB_FACE_CACHE = new HashMap<S t r i n g ,
S o f t R e f e r e n c e <SesameDBFace > >();
private SesameDBFace ( ) {}
private SesameDBFace ( L o c a l R e p o s i t o r y repository1 ) {
try {
repository = repository1 ;
graph = r e p o s i t o r y . getGraph ( ) ;
v a l u e F a c t o r y = graph . g e t V a l u e F a c t o r y ( ) ;
} catch ( AccessDeniedException e) {
throw new RuntimeException ( e ) ;
/ ∗∗
∗ Returns SesameDbFace object associated with repository with given id
∗
∗ @aut hor Jaroslaw D o b r z a n s k i <j a r o s l a w @ d o b r z a n s k i . n e t >
∗ @param repositoryId
∗ @return
∗/
public static SesameDBFace getInstance ( String repositoryId ) {
return getInstance ( repositoryId , null , null );
/ ∗∗
∗ Returns SesameDbFace object associated with repository with given id
∗ and for user with given login and password
∗
∗ @param repositoryId
∗ @param login
97
∗ @param password
∗ @return
∗/
public static SesameDBFace getInstance ( String repositoryId ,
String login , String password ) {
SesameDBFace dbFace = n u l l ;
synchronized (SESAME_DB_FACE_CACHE) {
S o f t R e f e r e n c e <SesameDBFace> ref =
SESAME_DB_FACE_CACHE. g e t ( r e p o s i t o r y I d ) ;
if ( r e f == n u l l || r e f . g e t ( ) == n u l l ) {
LocalService service = getService ( login , password ) ;
try {
LocalRepository repository =
( LocalRepository ) service . getRepository ( repositoryId ) ;
dbFace = SesameDBFace . g e t I n s t a n c e ( r e p o s i t o r y ) ;
} catch ( UnknownRepositoryException e) {
try {
LocalRepository repository =
service . createRepository ( repositoryId , false );
dbFace = SesameDBFace . g e t I n s t a n c e ( r e p o s i t o r y ) ;
} catch ( ConfigurationException e1 ) {
throw new
RuntimeException (" F a i l e d to create sesame repository (" +
repositoryId +")" , e1 ) ;
} catch ( ConfigurationException e) {
throw new RuntimeException (" F a i l e d to get the repository ("+
repositoryId +")" , e );
SESAME_DB_FACE_CACHE. p u t ( r e p o s i t o r y I d ,
new S o f t R e f e r e n c e <SesameDBFace >( dbFace ) ) ;
} else {
dbFace = r e f . get ( ) ;
if ( dbFace == n u l l ) {
throw new RuntimeException (" F a i l e d to get the repository (" +
repositoryId +")");
98
}
return dbFace ;
/ ∗∗
∗ Returns service that allows operations on the rpository
∗
∗ @param login
∗ @param password
∗ @return
∗/
private static LocalService getService ( String login , String password ) {
LocalService service = SesameServer . g e t L o c a l S e r v i c e ( ) ;
if ( login != n u l l && p a s s w o r d != null ) {
try {
service . login ( login , password ) ;
return service ;
} catch ( AccessDeniedException e) {
throw new RuntimeException (
S t r i n g . f o r m a t (ERR_ACCESS_DENIED, login ) , e );
return service ;
Querying the data storage
All queries to the data storage are handled by the SesameDBFace class from the
Didaskon DB module. Primarily, IKHarvester queries the storage to retrieve data.
For that reason, it uses the performGraphQuery(String query, Object... args) method
which takes two arguments (see List. 5.2):
• query a SERQL query itself
• args none, one or more arguments that are used in the query
99
Listing 5.2: Querying the data storage
/ ∗∗
∗ Performs a SERQL q u e r y and returns a graph containing its result .
∗
∗ @param query
∗ @param args
∗ @return
∗/
public Graph performGraphQuery ( S t r i n g query , Object . . . args ) {
try {
return r e p o s i t o r y . p e r f o r m G r a p h Q u e r y ( QueryLanguage . SERQL,
S t r i n g . format ( query , args ) ) ;
} catch ( Exception e) {
e . printStackTrace ( ) ;
return null ;
To make the code of IKHarvester cleaner and separate features related to the
storage issues, I have prepared the RDFQuery class that contains all the queries used
by the system (see List. 5.3).
Listing 5.3: RDF SERQL queries denition
public class RDFQuery {
private RDFQuery ( ) {}
/ ∗∗
∗ construct ∗ from { subject } predicate { object }
∗/
public static final S t r i n g SELECT_ALL =
" construct ∗ from { subject } predicate { object }";
/ ∗∗
∗ construct ∗ f r o m {<%s >} <%s> {<%s >}
∗/
public static final S t r i n g SELECT_ALL_FOR_ALL =
" construct ∗ f r o m {<%s >} <%s> {<%s > } " ;
100
/ ∗∗
∗ construct ∗ f r o m {<%s >} b {c}
∗/
public static final S t r i n g SELECT_ALL_FOR_SUBJECT =
" construct ∗ f r o m {<%s >} b {c }";
/ ∗∗
∗ construct ∗ from { a } <%s> { c }
∗/
public static final S t r i n g SELECT_ALL_FOR_PREDICATE =
" construct ∗ from { a } <%s> { c } " ;
/ ∗∗
∗ construct ∗ f r o m {<%s >} <%s> { c }
∗/
public static final S t r i n g SELECT_OBJECT_FOR_SUBJECT_AND_PREDICATE =
" construct ∗ f r o m {<%s >} <%s> { c } " ;
/ ∗∗
∗ construct ∗ from { a } <%s> {<%s >}
∗/
public static final S t r i n g SELECT_SUBJECT_FOR_PREDICATE_AND_OBJECT =
" construct ∗ from { a } <%s> {<%s > } " ;
/ ∗∗
∗ construct ∗ f r o m {%s } b {c}
∗/
public static final S t r i n g SELECT_ALL_FOR_BLANKNODESUBJECT =
" construct ∗ from {_:% s } b {c }";
/ ∗∗
∗ construct ∗ f r o m {<%s >} <%s> { c }
∗/
public static final String
SELECT_OBJECT_FOR_BLANKNODESUBJECT_AND_PREDICATE =
" construct ∗ from {_:% s } <%s> { c } " ;
/ ∗∗
∗ construct ∗ f r o m {<%s >} b {c} where c like "%s ∗ "
∗/
public static final String
SELECT_ALL_FOR_SUBJECT_AND_PREDICATE_LIKE =
" construct ∗ f r o m {<%s >} b {c} where c like \"% s ∗\"";

}
101
5.5.3 Extending IKHarvester
One of the requirements for the IKHarvester system demands allowing to extend
IKHarvester to support new types of online resources (see Sec. 4.3.4). Writing
new features should be facilitated. Therefore, I have decided that new classes for
harvesting metadata should extend the DataHarvesterImpl class and ones for pro-
viding metadata from the informal knowledge repository should extend the Dat-
aProviderImpl class. Both classes implement respectively the DataHarvester and the
DataProvider interfaces.
Since the idea for both harvesting and providing features is the same, in the
following listings (see List. 5.4 and List. 5.5) I present the mechanism only for
providing classes.
Listing 5.4: DataProvider interface
public interface DataProvider {
/ ∗∗
∗ Returns a LOJBean manifest containing
∗ metadata relevant for course composition
∗
∗ @param uri
∗ @return
∗ @throws IOException
∗/
public LOManifest getLOManifest ( ) ;
/ ∗∗
∗ Returns the content of a specific LOJBean .
∗
∗ @param resourceType The type of the resource
∗ @param content This parameters returns the content
∗ @return Information about whether
∗ the operation succeeded or failed
102
∗ @throws IOException
∗/
public HarvestingResults getLOContent ( S t r i n g resourceType ,
StringBuffer content ) throws IOException ;
The DataProviderImpl class implements the methods from the DataProvider inter-
face: getLOManifest() and getLOContent(String resourceType, StringBuer content).
Basing on the type of the resource, the former method creates an instance of the
specic subclass of DataProviderImpl by using Java Reection. Finally, features of
the subclass are invoked.
The name of the subclass has a sux which is equal to the name of the resource
type, while the sux is always DataProvider, for instance BlogPostDataProvider.
Listing 5.5: DataProvider interface
public class DataProviderImpl implements DataProvider {
/ ∗∗
∗ URI of the resource
∗/
protected String uri = null ;
/ ∗∗
∗ Gives access to a repository
∗/
protected SesameDBFace dbFace = n u l l ;
protected static Logger logger =
Logger . getLogger ( DataProviderImpl . c l a s s ) ;
/ ∗∗
∗ Constructor using resources uri
∗ @param uri
∗/
public DataProviderImpl ( S t r i n g uri1 ) {
uri = uri1 ;
dbFace = SesameDBFace . g e t I n s t a n c e ( C o n s t a n t . REPOSITORY_ID ) ;
103
/∗ ( non−J a v a d o c )
∗ @see org . c o r r i b . i k h a r v e s t e r . provider .
∗ D a t a P r o v i d e r#g e t L O M a n i f e s t ( j a v a . l a n g . S t r i n g )
∗/
public LOManifest getLOManifest ( ) {
LOManifest manifest = null ;
// get the reource type
StatementIterator iter =
dbFace . g e t G r a p h S t a t e m e n t s (
RDFQuery . SELECT_ALL_FOR_SUBJECT_AND_PREDICATE_LIKE,
uri , NS . NOTITIOUS . r e s o u r c e T y p e ) ;
if ( i t e r == n u l l || ! i t e r . hasNext ( ) ) {
return null ;
// t h e r e is maximum o n e entry
String resType = i t e r . next ( ) . getObject ( ) . t o S t r i n g ( ) .
s u b s t r i n g ( NS . NOTITIOUS . r e s o u r c e T y p e . l e n g t h ( ) ) ;
if ( ! U t i l . i s S t r i n g S e t ( resType ) ) {
return null ;
try {
Class p r o v i d e r C l a s s = C l a s s . forName (
DataProviderImpl . c l a s s . getPackage ( ) .
getName ( ) + " . " + r e s T y p e + " D a t a P r o v i d e r " ) ;
Class constrArgsTypes [ ] = { String . class };
Constructor ct = p r o v i d e r C l a s s . getConstructor ( constrArgsTypes ) ;
Object args [ ] = { uri };
DataProvider p r o v i d e r = ( DataProvider ) ct . newInstance ( args ) ;
m a n i f e s t = p r o v i d e r . getLOManifest ( ) ;
} catch ( SecurityException e) {
l o g g e r . e r r o r (" SecurityException " , e );
} catch ( IllegalArgumentException e) {
l o g g e r . e r r o r (" IllegalArgumentException " , e );
} catch ( ClassNotFoundException e) {
l o g g e r . e r r o r (" ClassNotFoundException " , e );
104
} catch ( NoSuchMethodException e) {
l o g g e r . e r r o r ( " NoSuchMethodException " , e );
} catch ( InstantiationException e) {
l o g g e r . e r r o r (" I n s t a n t i a t i o n E x c e p t i o n " , e );
} catch ( IllegalAccessException e) {
l o g g e r . e r r o r (" I l l e g a l A c c e s s E x c e p t i o n " , e );
} catch ( InvocationTargetException e) {
l o g g e r . e r r o r (" InvocationTargetException " , e );
return manifest ;
/∗ ( non−J a v a d o c )
∗ @see o r g . c o r r i b . i k h a r v e s t e r . p r o v i d e r . D a t a P r o v i d e r#g e t L O C o n t e n t (
∗ String resourceType , StringBuffer content )
∗/
public HarvestingResults getLOContent ( S t r i n g resourceType ,
StringBuffer content ) throws IOException {
DataHarvester h a r v e s t e r = new DataHarvesterImpl ( uri , resourceType ) ;
return harvester . harvestContent ( content ) ;
/ ∗∗
∗ Returns a list containg all LOs that can be collected
∗ from the informal knowledge repository .
∗
∗ @return
∗/
public static L i s t <LOJBean> g e t L O L i s t ( ) {
L i s t <LOJBean> o b j e c t s = new A r r a y L i s t <LOJBean > ( ) ;
o b j e c t s . a d d A l l ( g e t L O L i s t ( C o n s t a n t .RESOURCE_TYPE_BLOG ) ) ;
o b j e c t s . a d d A l l ( g e t L O L i s t ( C o n s t a n t .RESOURCE_TYPE_DL ) ) ;
o b j e c t s . a d d A l l ( g e t L O L i s t ( C o n s t a n t . RESOURCE_TYPE_WIKI ) ) ;
Collections . sort ( objects );
return objects ;
105
/ ∗∗
∗ Returns a list containg all LOs of a specified type
∗
∗ @param type
∗ @return
∗/
public static L i s t <LOJBean> g e t L O L i s t ( S t r i n g type ) {
SesameDBFace dbFace = SesameDBFace . g e t I n s t a n c e (
C o n s t a n t . REPOSITORY_ID ) ;
L i s t <LOJBean> l o s = new A r r a y L i s t <LOJBean > ( ) ;
StatementIterator iter = dbFace . g e t G r a p h S t a t e m e n t s (
RDFQuery . SELECT_SUBJECT_FOR_PREDICATE_AND_OBJECT,
RDF . TYPE, NS . NOTITIOUS . r e s o u r c e T y p e+t y p e ) ;
if ( i t e r == n u l l ) {
return los ;
while ( i t e r . hasNext ( ) ) {
String uri = i t e r . next ( ) . g e t S u b j e c t ( ) . t o S t r i n g ( ) ;
// c h e c k if removed , this information must also be provided
boolean removed = false ;
StatementIterator it = dbFace . g e t G r a p h S t a t e m e n t s (
RDFQuery . SELECT_OBJECT_FOR_SUBJECT_AND_PREDICATE,
uri , NSBlog . removed ) ;
if ( i t == n u l l ) {
continue ;
if ( i t . hasNext ( ) ) {
removed = t r u e ;
l o s . add ( new LOJBean ( u r i , removed ) ) ;
it . close ();
iter . close ();
Collections . sort ( los );
return los ;
106
}
5.5.4 Adding data to the informal knowledge repository

IKHarvester allows adding resources to the informal knowledge repository either
with a client application or by putting their URL in the form on the main page of
the system. The latter method might be bothersome since every time a user must
go to the above mentioned web page and return to the nal one after the adding.
To reach out users needs, I have created an add-on for Firefox, one of the most
popular web browser. In general, an add-on adds some functionality for a piece of
software. The one I have created works with the implementation of IKHarvester
19
deployed on the notitio.us project (see Sec. 6.1.2); it adds a button with post
to notitio.us link on web pages of one of a type supported by IKHarvester (see
Fig. 5.3).
Figure 5.3: IKHarvester support for web browsers
19 http://notitio.us/
107
Any time a user visits such page, he/she can click the above mentioned button.
He/she is redirected to one of IKHarvester web pages, where the initial page can be
tagged and saved to the informal knowledge repository. All the information in that
repository is shared.
IKHarvester add-on for Firefox can be downloaded from http://notitio.us/

ikh/browser.jsp.
108
Chapter 6
Conclusions
Nowadays, there is a lot of informal knowledge available on the Internet; it can be
found in blogs, fora, digital libraries, wikis, etc. The amount of such data is growing
rapidly; more and more tools for managing it is developed.
In this thesis, I have included the results of my research in the eld of the Se-
mantic Web and eLearning. I have presented those two approaches and dened the
model of Social Semantic Information Sources. I have proposed a way of capturing
informal learning from a few types of SSIS and delivering it to eLearning 2.0 frame-
works. Then, I have designed the architecture of such system (IKHarvester) and
developed it. Finally, I have successfully deployed IKHarvester in the real environ-
ment.
6.1 Achievements
6.1.1 Publications
This thesis is dedicated to the issue of collecting informal knowledge from Social
Semantic Information Sources. The idea of how to capture data from online resources
is quite innovative and there is a lot of eort put into research in that eld.
1
The Semantic Infrastructure (SemInf ) lab, the core part of the Corrib Cluster
2
in the Digital Enterprise Research Institute , whose member I am, is also interested
1 http://corrib.org/
2 http://deri.org
109
in this area. During last months, we have created a few publication.
The following list presents two of those articles:
• S. R. Kruk, A. Gzella, J. Dobrza«ski, B. McDaniel, T. Woroniecki. E-Learning
on the Social Semantic Information Sources. Proceedings of the Second Eu-
ropean Conference on Technology Enhanced Learning, 2007, Crete, Greece,
September 17-20, 2007
• J. Dobrza«ski, S. R. Kruk, T. Nagle, E. Curry, A. Gzella. IKHarvester In-
formal eLearning with Semantic Web Harvesting. 6th International Semantic
Web Conference, 2007, Busan, Korea, November 11-15, 2007 (Submitted)
• J. Dobrza«ski. Employing Social Semantic Information Sources for e-Learning.
Faculty of Engineering Research Day 2007, Galway, Ireland, April 16, 2007
• J. Jankowski, F. Czaja, J. Dobrza«ski. Adapting informal sources of knowledge
to e-Learning. 5th Annual Conference on Teaching and Learning: Learning
Technologies, Galway, Ireland, June 7-8, 2007
• J. G. Breslin, S. Grzonkowski, A. Gzella, S. R. Kruk, T. Woroniecki, and J. Do-
brza«ski. Sharing Information Across Community Portals with FOAFRealm.
International Journal of Web Based Communities (Submitted)
6.1.2 IKHarvester
Although current version of IKHarvester is a prototype, it works well and collects a
lot of relevant data from SSIS.
Benets
To recap, there are few solutions for capturing managing semantic annota-
tions (metadata) for online resources useful in learning process: PingtheSeman-
ticWeb.com, Piggy Bank, and Zotero (see Sec. 4.1.1). Although their goal is similar,
they achieve it in dierent ways. Table 6.1 explicitly shows the dierence between
the above mentioned solutions, indicating the level of support for each of the feature.
110
Table 6.1: Comparison of tools for collecting informal data
Ping the Se-

Feature IKHarvester Piggy Bank Zotero
manticWeb
buttons: FF,
Integration with buttons: FF, FF an add-on FF an add-on
Opera, IE;
browsers Opera, IE itself itself
add-on for FF
full (also ad-

Support for
ditional infor-
Semantic Medi- full full none
mation besides
aWikis
from RDF)
Support for
some none weak none
Wikipedia
Support for
full some full weak
JeromeDL
Tagging yes no yes yes
no, but works
Is remote service yes yes with Semantic no
Bank
Accessible with
yes yes no no
Web Services
partially
Allows data
yes yes sharing with no
sharing
Semantic Bank
no depen- no depen-
Support for new yes writ-
yes writing dency on the dency on the
document types ing new screen
new blades authors of the authors of the
(extensibility) scrapers
tool tool
111
Integration with web browsers is crucial for such systems. The more web browser
the system supports, the better; such a tool should not demand using a specic
browser. Since Piggy Bank and Zotero are Firefox add-on, they are perfectly inte-
grated only in this browser. IKHarvester and PingtheSemanticWeb.com also support
Internet Explorer and Opera by providing special buttons for capturing data. More-
over, some features of IKHarvester can be invoked by using a special add-ons for
Firefox (see Sec. 5.5.4).
All compared tools, except from Zotero, are able to collect sucient amount of
metadata for online resources available on web pages, by reading RDF documents
that those pages link to. By sucient, we mean more information than the URL
or the title of the resource. For instance, there should be some information about
the author of the resource or related resources. IKHarvester distinguishes itself as
it collects metadata also from non-semantic web pages, like Wikipedia which is a
treasury of informal knowledge.
To make much more use of metadata for learning purposes, it should be shared
and made available for all. For that reason, it is necessary to access it with Web
Services as it improves its accessibility and reusability. Also, tagging helps managing
collected information and facilitates searching and browsing. Again, IKHarvester
acquits itself well. All shared data can be retrieved, saved, and tagged by calling
REST Web Services.
Beyond that, IKHarvester has a considerable eLearning background. It treats
online resources as learning material (informal Learning Objects), and uses captured
data as its description. Moreover, IKHarvester delivers these metadata in a form in
accordance with LOM standard. This rich information is used by eLearning LMSs,
to perform accurate reasoning and provide well tailored courses.
Success stories
There can be found a lot of projects it can be used in. At the moment it is employed
by two of SemInf group projects: Didaskon, and notitio.us
112
Didaskon
3
IKHarvester has been designed as a SOA layer for Didaskon , a system designed
according to eLearning 2.0 assumptions. Didaskon delivers a framework for com-
posing on-demand curriculum from existing learning objects provided by eLearning
services (formal knowledge). Besides, it derives from SSIS - sources of informal
knowledge [53].
Basing on some preconditions, Didaskon creates a learning path which best ts a
specic learner. To achieve that, the system uses initial information (preconditions)
like a student's needs, skills, learning history etc., anticipated resulting skills and
knowledge (goals), and technical details of the clients platform.
4
Initially, IKHarvester was supposed to work with Didakon , an eLearning 2.0
framework (see Sec. 3.3.2). However, during the development, I have found another
application IKHarvester could be employed by.
notitio.us
5
notitio.us is a service for collaborative knowledge aggregation and sharing; it em-
ploys IKHarvester for retrieving RDF information about Web resources bookmarked
by the users. Therefore, it is capable of indexing rich metadata, coming from various
types of resources; in contrary to bookmarking services, such as del.icio.us, notitio.us
keeps rich, semantically interconnected metadata shared by the users using Social
Semantic Collaborative Filtering [28].
The resources not only can be shared with bookmarking interface (SSCF),
but also, based on the rich metadata, they can be searched and browsed using
6
TagsTreeMaps , a tags browser based on treemaps rendering algorithm, and Multi-
BeeBrowse, a collaborative browsing component; this components improve user
browsing experience, utilizing metadata delivered by IKHarvester.
One of modules delivered by IKHarvester allows to expose aggregated metadata
in LOM standard, which turns notitio.us into a valuable source of learning objects
5 http://notitio.us/
6 TagsTreeMaps: http://sf.net/projects/tagstreemaps/
113
Figure 6.1: IKHarvester in the notitio.us service
based on informal knowledge, delivered by IKHarvester (see Fig. 6.1).
To learn more about IKHarvester and notitio.us, please visit its home page:
http://notitio.us/ikh/.
6.2 Future work

As stated before, current version can operate of three resources types: wikis based
on MediaWiki engine, blogs that support SIOC, and JeromeDL.
The system was designed in a manner that allows extending it so that it works
with other sources of informal knowledge (see Fig. 4.6). In future, it should support
more types of online resources, among others: Bricks (another digital library), blogs
7
hosted on Blogger , and other types of wiki engines.
7 http://www.blogger.com/
114
Bibliography
[1] Marc (machine readable cataloging) and sgml/xml.
http://xml.coverpages.org/marc.html, July 2002.
[2] M.-H. Abel, A. Benayache, D. Lenne, C. Moulin, C. Barry, and B. Chaput.
Ontology-based organizational memory for e-learning. Educational Technology
& Society, 7(4):98111, 2004.
[3] L. Aroyo and D. Dicheva. The new challenges for e-learning: The educational
semantic web. Educational Technology & Society, 7(4):5969, 2004.
[4] U. Bojars, J. Breslin, and A. Passant. Sioc browser - towards a richer blog
browsing experience. In Accepted for the 4th Blogtalk Conference (Blogtalk
reloded), Vienna, Austria.
[5] J. Brase, M. Painter, and W. Nejdl. Completing LOM - how additional axioms
increase the utility of learning object metadata. In ICALT, page 493. IEEE
Computer Society, 2003.
[6] M. Cygan. Ubiquitous search service component gateway for heterogeneous l2l
network. Master's thesis, Gdansk University of Technology, 2006.
[7] L. Dodds. An Introduction to FOAF.
http://www.xml.com/pub/a/2004/02/04/foaf.html, February 2004.
[8] S. Downes. E-learning 2.0. eLearn magazine. Online, ac-
cessed May 1st, 2007, http://www.elearnmag.org/subpage.cfm?section=

articles&article=29-1.
128
[9] DublinCore Initiative, http://dublincore.org/documents/dces/. Dublin Core
Metadata Element Set, Version 1.1: Reference Description.
[10] R. T. Fielding. Architectural styles and the design of network-based software
architectures, 2000. Online; accessed April 7, 2007; http://www1.ics.uci.

edu/%7Efielding/pubs/dissertation/rest_arch_style.htm.
[11] E. M. Frank Manola. Rdf primer, w3c recommendation. Online; accessed
December 16, 2006, http://www.w3.org/TR/rdf-primer/.
[12] P. Graham. Web 2.0. Online; accessed December 18, 2006; http://www.
paulgraham.com/web20.html.
[13] S. Grzonkowski, A. Gzella, H. Krawczyk, S. R. Kruk, F. J. M.-R. Moyano, and
T. Woroniecki. D-FOAF - Security Aspects in Distributed User Managment
System. In TEHOSS'2005.
[14] A. Gzella. Service oriented architecture for distributed identity management
system. Master's thesis, Gdansk University of Technology, September 2006.
[15] H. He. What is service-oriented architecture, September 2003. Online, ac-
cessed April 7, 2007, http://webservices.xml.com/pub/a/ws/2003/09/30/

soa.html.
[16] M. V. Heiko Haller, Markus Kroetzsch and D. Vrandecic. Semantic wikipedia.
In D. Riehle and J. Noble, editors, Proceedings of the 2006 International Sym-
posium on Wikis, 2006, Odense, Denmark, August 21-23, 2006, pages 137138.
ACM, 2006.
[17] J. Hendler and O. Lassila. The semantic web. Scientic American Magazine,
May 2001.
[18] IEEE. Draft standard for learning object metadata. Techni-
cal report, Institute of Electrical and Electronics Engineers, Inc.,
http://ltsc.ieee.org/wg12/les/LOM_1484_12_1_v1_Final_Draft.pdf, 2002.
[19] J. Jankowski. Internetowy system zdalnego nauczania oparty o usªugi sieciowe.
Master's thesis, Gdansk University of Technology, 2006.
129
[20] A. H. John Breslin, Stefan Decker. Sioc: an approach to connect web-based
communities. Int. J. of Web Based Communities, 2:132142, jul 2006.
[21] S. D. John Breslin. Semantic web 2.0: Creating social semantic information
spaces.
[22] D. R. Karger and D. Quan. What would it mean to blog on the semantic web?
J. Web Sem, 3:147157, 2005.
[23] T. Karrer. elearning 2.0: Informal learning, communities, bottom-up vs. top-
down, feb 2006. Online; accessed September 20, 2006; http://elearningtech.

blogspot.com/2006/02/elearning-20-informal-learning.html.
[24] R. Khare. Microformats: The next (small) thing on the semantic web? IEEE
Internet Computing, 10(1):6875, 2006.
[25] KnowledgeNet. Knowledgenet - history of e-learning. Online; accessed
November 6, 2006; http://www.knowledgenet.com/corporateinformation/

ourhistory/history.jsp.
[26] S. R. Kruk. MarcOnt Initiative. Technical report, DERI.Galway, Ireland,
http://www.marcont.org/, 10 2004. Bibliographic description and related tools
utilising Semantic Web technologies.
[27] S. R. Kruk. E-learning on semantic web 2.0, 2006. Online; accessed November 5,
2006; http://www.sebastiankruk.com/storage/presentation/elearning_
on_sw20/img0.html.
[28] S. R. Kruk, S. Decker, A. Gzella, S. Grzonkowski, and B. McDaniel. Social
semantic collaborative ltering for digital libraries. Journal of Digital Informa-
tion, Special Issue on Personalization, 2006.
[29] S. R. Kruk, S. Grzonkowski, A. Gzella, and M. Cygan. Digime - ubiquitous
search and browsing for digital libraries. In Mobile Data Management, 2006.
[30] S. R. Kruk, S. Grzonkowski, A. Gzella, T. Woroniecki, and H.-C. Choi. D-foaf:
Distributed identity management with access rights delegation. In The Seman-
tic Web - ASWC 2006, First Asian Semantic Web Conference, Beijing, China,
130
September 3-7, 2006, Proceedings, volume 4185 of Lecture Notes in Computer
Science, pages 140154, 2006.
[31] S. R. Kruk, K. Samp, T. Woroniecki, A. Westerski, F. Czaja, and C. O'Nuallain.
E-learning based on the social semantic information sources. In submitted to
ISWC, 2006.
[32] A. D. Learning. Scorm homepage. Online; accessed May 1st, 2007; http:
//www.adlnet.gov/scorm/.
[33] R. S. Ljiljana Stojanovic, Steen Staab. elearning based on the seman-
tic web, jan 2001. Online; accessed November 5, 2006; http://www.aifb.

uni-karlsruhe.de/WBS/Publ/2001/WebNet_lstsstrst_2001.pdf.
[34] E. T. Marieke Guy. Folksonomies: Tidying up tags? D-Lib Magazine, 12(1),
jan 2006. Online; accessed April 30th, 2007; http://www.dlib.org/dlib/

january06/guy/01guy.html.
[35] A. Mathes. Folksonomies - cooperative classication and commu-
nication through shared metadata. December 2004. Online; ac-
cessed April 30th, 2007; http://www.adammathes.com/academic/

computer-mediated-communication/folksonomies.html.
[36] D. G. W. Mission. Beyond elearning: practical insights from the usa. Technical
report, May 2006.
[37] K. Moeller. A generalised approach for generating semantic metadata in the
blogosphere.
[38] E. Moreale and M. Vargas-Vera. Semantic services in e-learning: an argumen-
tation case study. volume 7, pages 112128, 2004.
[39] W. Nejdl, B. Wolf, C. Qu, S. Decker, M. Sintek, A. Naeve, M. Nilsson, M. Palmr,
and T. Risch. Edutella: A p2p networking infrastructure based on rdf. Jan. 01
2002.
131
[40] M. of New Media. E-learning - m/cyclopedia of new media, 2006. Online; ac-
cessed November 5, 2006; http://wiki.media-culture.org.au/index.php/

E-Learning.
[41] S. O'Hear. E-learning 2.0 - how web technologies are shaping education. On-
line; accessed September 19, 2006; http://www.readwriteweb.com/archives/

e-learning_20.php.
[42] S. O'Hear. Seconds out, round two. The Guardian, 2005. Online; accessed
September 20, 2006; http://education.guardian.co.uk/elearning/story/

0,10577,1642281,00.html.
[43] T. O'Reilly. What is web 2.0. Online; accessed December 16,
2006; http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/
what-is-web-20.html.
[44] E. Oren. SemperWiki: a semantic personal Wiki. In Proceedings of the ISWC
Workshop on the Semantic Desktop, Nov. 2005.
[45] P. Prescod. Rest and the real world, February 2002. Online; accessed April 7,
2007; http://webservices.xml.com/lpt/a/ws/2002/02/20/rest.html.
[46] H. Rollett, M. Lux, M. Strohmaier, G. Dosinger, and K. Tochtermann. The web
2.0 way of learning with technologies. Int. J. of Learning Technology, 3:87107,
Feb. 07 2007.
[47] K. Z. Sebastian R. Kruk, Marcin Synak. Marcont - integration ontology for
bibliographic description formats. In Proceedings of the International Confer-
ence on Dublin Core and Metadata Applications (DC-2005), Madrid, Spain,
September 2005.
[48] K. Z. Sebastian R. Kruk, Marcin Synak. Marcont initiative - mediation services
for digital libraries. In In Proceedings of ECDL 2005.
[49] L. Z. Sebastian R. Kruk, Stefan Decker. Jeromedl - managing digital library
database with the semantic web technologies. In Proceedings of the 16th In-
132
ternational Conference on Database and Expert Systems Applications. Copen-
hagen, Denmark, 2005.
[50] S. D. Sebastian R. Kruk. Semantic social collaborative ltering with foafrealm.
2005. Semantic Desktop Workshop, ISWC 2005.
[51] B. Simon, Z. Miklós, W. Nejdl, M. Sintek, and J. Salvachúa. Elena: A mediation
infrastructure for educational services. In WWW (Alternate Paper Tracks),
2003.
[52] D. Team. Didaskon project description, 2006. Online; accessed
September 20, 2006; http://wiki.jeromedl.org/Projects/W2W/

WorkingGroupProjectETI/2006/Didaskon.
[53] D. team. Didaskon project documentation. Technical report, Digital Enterprise
Research Institute (DERI), http://didaskon.corrib.org/, 2006.
[54] K. M. Uldis Bojars, John Breslin. Using semantics to enhance the blogging
experience. In The Semantic Web: Research and Applications, 3rd European
Semantic Web Conference, ESWC 2006, Budva, Montenegro, June 11-14, 2006,
Proceedings, volume 4011, pages 679696. Springer, 2006.
[55] D. M. Vargas-Vera and D. E. Motta. Aqua - ontology-based question answering
system. Jan. 01 2004. Online; accessed May 1st; http://kmi.open.ac.uk/

publications/pdf/kmi-04-20.pdf.
[56] W3C. Owl web ontology language guide. Online; accessed December 16, 2006;
http://www.w3.org/TR/owl-guide/.
[57] W3C. Owl web ontology language overview. Online; accessed March 21, 2007;
http://www.w3.org/TR/owl-features/.
[58] W3C. Primer: Getting into rdf & semantic web using n3. Online; accessed
December 16, 2006; http://www.w3.org/2000/10/swap/Primer.
[59] W3C. Rdf vocabulary description language 1.0: Rdf schema. Online; accessed
March 21, 2007; http://www.w3.org/TR/rdf-schema/.
133
[60] Wikimedia. Learning object metadata - meta. Online; accessed April 12, 2007,
http://meta.wikimedia.org/wiki/Learning_object_metadata.
[61] Wikipedia. E-learning - wikipedia, the free encyclopedia, 2006. Online; accessed
September 20, 2006; http://en.wikipedia.org/wiki/Elearning.
[62] Wikipedia. Semantic web - wikipedia, the free encyclopedia, 2006. Online; ac-
cessed November 22, 2006; http://en.wikipedia.org/wiki/Semantic_web.
[63] Wikipedia. World wide web - wikipedia, the free encyclopedia, 2006. Online;
accessed November 22, 2006; http://en.wikipedia.org/wiki/World_Wide_

Web.
[64] D. N. Z. Bjelogrlic. The `a-semantic platform': Solving basic semantic web
problems in security-related elds.
[65] D. Zambonini. Is web 2.0 killing the semantic web? O'Reilly XML Blog,
October 2005. Online; accessed December 28, 2006; http://www.oreillynet.

com/xml/blog/2005/10/is_web_20_killing_the_semantic.html.
134
List of Figures
1.1 Capturing informal learning with IKHarvester . . . . . . . . . . . . . 8
2.1 LOM structure (from [60]) . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 The Semantic Web Stack (from W3C) . . . . . . . . . . . . . . . . . 16
2.3 RDF statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 An example of a social network . . . . . . . . . . . . . . . . . . . . . 24
3.1 Location of SSIS in the Web (gure concept: [21]) . . . . . . . . . . . 30
3.2 Online communities overview (from [4]). . . . . . . . . . . . . . . . . 41
3.3 Main concepts in SIOC Ontology (from SIOC homepage) . . . . . . . 42
4.1 System scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Use Case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Component diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 Class diagram (part #1) . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.5 Class diagram (part #2) . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Blades for dierent SSIS types . . . . . . . . . . . . . . . . . . . . . 78
5.1 Three-tier architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2 IKHarvester main page . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3 IKHarvester support for web browsers . . . . . . . . . . . . . . . . 107
6.1 IKHarvester in the notitio.us service . . . . . . . . . . . . . . . . . . 114
135
List of Listings
2.1 N3 RDF representation . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 RDF/XML representation . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1 Support for SIOC information . . . . . . . . . . . . . . . . . . . . . . 79
5.1 Retrieving the connection to the data storage . . . . . . . . . . . . . 97
5.2 Querying the data storage . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3 RDF SERQL queries denition . . . . . . . . . . . . . . . . . . . . . 100
5.4 DataProvider interface . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.5 DataProvider interface . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A.1 Informal knowledge repository conguration . . . . . . . . . . . . . . 139
A.2 Host context conguration . . . . . . . . . . . . . . . . . . . . . . . . 140
B.1 Learning Object Metadata . . . . . . . . . . . . . . . . . . . . . . . . 141
B.2 The content of a Learning Object . . . . . . . . . . . . . . . . . . . . 148
B.3 List of Learning Objects . . . . . . . . . . . . . . . . . . . . . . . . . 150
136
List of Tables
2.1 New trends in the Web (concept: [43]). . . . . . . . . . . . . . . . . . 22
2.2 Metamorphosis of the Web (concept: [21]). . . . . . . . . . . . . . . . 26
4.1 Requirement description template . . . . . . . . . . . . . . . . . . . . 52
4.2 Mapping: posts attribute - semantic description - LOM. . . . . . . . 79
4.3 Mapping: wiki article - semantic description - LOM. . . . . . . . . . . 81
4.4 Mapping: JeromeDL resource - semantic description - LOM. . . . . . 83
5.1 REST get LO list . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.2 REST get LOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 REST get LO content . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4 REST add LO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.5 REST remove LO . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.1 Comparison of tools for collecting informal data . . . . . . . . . . . . 111
137
Appendix A
Installation guide
A.1 Apache Tomcat

IKHarvester uses Apache Tomcat (version 5.5.23 or newer), the servlet container.
The web container can be downloaded from its home page: http://tomcat.apache.
org/. Apache Tomcat should be installed according to the instructions available on
the above mentioned web page.
Let us assume, TOMCAT_DIR/ is the Tomcat installation directory; this name will
be used further in this chapter.
A.2 Sesame
Sesame, RDF storage, plays the role of the informal knowledge repository. IKHar-
vester uses Sesame version 1.2.6 which available at: http://www.openrdf.org/

download.jsp.
Again, one should follow instructions from the tool's home page. In short,
Sesame webapp must be put into TOMCAT_DIR/webapps/, and all jars moved from
TOMCAT_DIR/webapps/sesame/WEB-INF/lib/ to TOMCAT_DIR/common/lib/.
Having installed Sesame, it should be congured. For that reason, put the fol-
lowing code (see List. A.1) in TOMCAT_DIR/webapps/sesame/WEB-INF/system.conf

le, inside the <repositorylist> section.
In the listing, STORAGE_FILENAME is the path to the le where RDF data will be
138
stored.
Listing A.1: Informal knowledge repository conguration
<r e p o s i t o r y i d =" j o i n e d − r e p o s i t o r y ">
<a c l w o r l d R e a d a b l e =" t r u e " w o r l d W r i t e a b l e =" t r u e " />
< t i t l e >i k − r e p o s i t o r y </ t i t l e >
<s a i l s t a c k >
<s a i l c l a s s =" o r g . o p e n r d f . s e s a m e . s a i l i m p l . s y n c . S y n c R d f R e p o s i t o r y " />
<s a i l c l a s s =" o r g . o p e n r d f . s e s a m e . s a i l i m p l . memory . R d f S c h e m a R e p o s i t o r y ">
<param name=" f i l e " v a l u e ="STORAGE_FILENAME" />
</ s a i l >
</ s a i l s t a c k >
</ r e p o s i t o r y >
A.3 IKHarvester
IKHarvester can be run in two ways, either by dening the listener in Apache Tom-
cat, or by creating war and deploying it to TOMCAT_DIR/webapps/ directory.
A.3.1 Downloading the source code

1
IKHarvester source code is available at SourceForge.net , as a part of Didaskon
project.
The direct link to the source code: https://didaskon.svn.sourceforge.net/

svnroot/didaskon/IKHarvester
A.3.2 Conguration
After downloading the application, put all jar les from
IKHARVESTER_DIR/dist/TOMCAT_DIR/common/lib/ to TOMCAT_DIR/common/lib/
directory. Also, commons-fileupload-1.1.jar le (copied from Sesame 1.2.6
distribution) must be deleted, because along with IKHarvester les you have
downloaded newer version of that le.

1 http://sourceforge.net/
139
Running the application
There are two ways of running IKHarvester. You can do it either by dening a
listener in Apache Tomcat or by creating and deploying Web ARchive le.
Conguring the listener
This is more convenient way of running IKHarvester. After addition conguration
of Apache Tomcat, the web container sees changes to the source les every time they
are compiled. Consequently, there is no need to redeploying war le and restarting
the container.
Put the code from List. A.2 in le TOMCAT_DIR/conf/server.xml, at the
end of <Host name="localhost" ... section and restart Apache Tomcat.
IKHARVESTER_DIR is the path to IKHarvester directory (with source les), whereas
path="/ikh" denes the URL IKHarvester is available at. In this example it is
http://localhost:8080/ikh
Listing A.2: Host context conguration
<C o n t e x t p a t h ="/ i k h " d o c B a s e="IKHARVESTER_DIR/ WebContent "
debug ="0" r e l o a d a b l e =" t r u e "/>
Deploying WAR
This is a more burdensome way of running IKHarvester.
After changes to the system source les, one musts run ant script
IKHARVESTER_DIR/build.xml. It compiles Java classes, creates a Web ARchive
le, ikharvester.war, and deploys it to TOMCAT_DIR/webapps/ directory.
The onerousness of this approach lies in the fact that every time the developer
makes a change, he must create new ikharvester.war le, deploy it, and restart
the web container. That is why, I suggest using the former approach.
140
Appendix B
Output examples
B.1 LOM example

IKHarvester provides descriptions of Learning Objects (LOs) stored in the informal
knowledge repository (see Tab. 4.2.2 for details for that functional requirement) in
a form suggested by LOM standard (see Sec. 2.1.2).
In the List. B.1 there is presented the description of a LO created out of informa-
tion harvested from a blog post available at: http://dobrzanski.net/2007/04/

23/ajax-activity-indicator/
Listing B.1: Learning Object Metadata
<?xml v e r s i o n ="1.0" e n c o d i n g ="UTF−8" ?>
<lom>
<g e n e r a l >

<c a t a l o g >URI</ c a t a l o g >
<e n t r y >
h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 2 3 / a j a x −a c t i v i t y −i n d i c a t o r /
</ e n t r y >

<t i t l e >
<l a n g s t r i n g xml : l a n g ="e n">
AJAX activity indicator
</ l a n g s t r i n g >
</ t i t l e >
141
<l a n g u a g e >en </ l a n g u a g e >
<d e s c r i p t i o n >
Users are familiar with indications of work performed
in background since first versions o f MS Windo
</ d e s c r i p t i o n >
<keyword>
http :// dobrzanski . net / category / ajax /
http :// dobrzanski . net / category / j a v a s c r i p t /
h t t p : / / d o b r z a n s k i . n e t / c a t e g o r y / web20 /
</keyword>
<s t r u c t u r e >
<s o u r c e >LOMv1.0 </ s o u r c e >
<v a l u e >a t o m i c </ v a l u e >
</ s t r u c t u r e >
<a g g r e g a t i o n l e v e l >
<v a l u e >1</ v a l u e >
</ a g g r e g a t i o n l e v e l >
</ g e n e r a l >
<l i f e c y c l e >
<v e r s i o n >
2007 − 04 − 23T22 : 4 3 : 1 5 Z
</ v e r s i o n >
<c o n t r i b u t e >
<r o l e >
<v a l u e >a u t h o r </ v a l u e >
142
</ r o l e >
< e n t i t y >h t t p : / / d o b r z a n s k i . n e t / a u t h o r / admin/</ e n t i t y >
<d a t e >
<d a t e t i m e >2007 −04 −23T22 : 4 3 : 1 5 Z</ d a t e t i m e >
<l a n g s t r i n g xml : l a n g ="e n"> C r e a t i o n d a t e </ l a n g s t r i n g >
</d a t e >
</ c o n t r i b u t e >
<s t a t u s >
<v a l u e >r e v i s e d </ v a l u e >
</ s t a t u s >
</ l i f e c y c l e >
<metametadata>
<e n t r y >
</ e n t r y >
<c o n t r i b u t e >
<r o l e >
<v a l u e >a u t h o r </ v a l u e >
</ r o l e >
< e n t i t y >h t t p : / / d o b r z a n s k i . n e t / a u t h o r / admin/</ e n t i t y >
<d a t e >
<d a t e t i m e >2007 −04 −23T22 : 4 3 : 1 5 Z</ d a t e t i m e >
<l a n g s t r i n g xml : l a n g ="e n"> C r e a t i o n d a t e </ l a n g s t r i n g >
</d a t e >
</ c o n t r i b u t e >
<m e t a d a t a s c h e m a>LOMv1.0 </ m e t a d a t a s c h e m a>
</metametadata>
143
<t e c h n i c a l >
<f o r m a t >t e x t / html </f o r m a t >
<l o c a t i o n >
</ l o c a t i o n >
<r e q u i r e m e n t >
<o r c o m p o s i t e >
<t y p e >
<v a l u e >o p e r a t i n g s y s t e m </ v a l u e >
</t y p e >
<name>
<v a l u e >m u l t i −o s </ v a l u e >
</name>
</ o r c o m p o s i t e >
</ r e q u i r e m e n t >
<r e q u i r e m e n t >
<o r c o m p o s i t e >
<t y p e >
<v a l u e >b r o w s e r </ v a l u e >
</t y p e >
<name>
<v a l u e >any</ v a l u e >
</name>
</ o r c o m p o s i t e >
</ r e q u i r e m e n t >
</ t e c h n i c a l >
<e d u c a t i o n a l >
<l e a r n i n g r e s o u r c e t y p e >
<s o u r c e >D i d a s k o n </ s o u r c e >
<v a l u e >B l o g P o s t </ v a l u e >
</ l e a r n i n g r e s o u r c e t y p e >
144
Users are familiar with indications of work performed in
background since first versions o f MS Windo

<v a l u e >e x p o s i t i v e </ v a l u e >


<v a l u e >low </ v a l u e >

<s e m a n t i c d e n s i t y >
<v a l u e >medium</ v a l u e >
</ s e m a n t i c d e n s i t y >

<v a l u e >l e a r n e r </ v a l u e >

<c o n t e x t >
<v a l u e >t r a i n i n g </ v a l u e >
</ c o n t e x t >
<c o n t e x t >
<v a l u e >s c h o o l </ v a l u e >
</ c o n t e x t >
<c o n t e x t >
<v a l u e >h i g h e r e d u c a t i o n </ v a l u e >
</ c o n t e x t >
<c o n t e x t >
<v a l u e >o t h e r </ v a l u e >
145
</ c o n t e x t >
<d i f f i c u l t y >
<v a l u e >medium</ v a l u e >
</ d i f f i c u l t y >
</ e d u c a t i o n a l >
<r i g h t s >
<c o s t >
<v a l u e >no</ v a l u e >
</ c o s t >
</ r i g h t s >
<r e l a t i o n >
<k i n d >
<v a l u e >r e f e r e n c e s </ v a l u e >
</k i n d >
<r e s o u r c e >
<e n t r y >
h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 2 2 / u s i n g −put −and− d e l e t e −methods −i n −a j a x − r e q u e s t a −w i t
</ e n t r y >
<l a n g s t r i n g xml : l a n g ="e n"> r e f e r e n c e s </ l a n g s t r i n g >
</ r e s o u r c e >
</ r e l a t i o n >
<r e l a t i o n >
<k i n d >
</k i n d >
<r e s o u r c e >
146
<e n t r y >h t t p : / /www . n a p y f a b . com/ a j a x − i n d i c a t o r s /</ e n t r y >
<r e l a t i o n >
<k i n d >
</k i n d >
<r e s o u r c e >
<e n t r y >h t t p : / / a j a x l o a d . i n f o /</ e n t r y >
</lom>
147
B.2 LO content example
Apart from the description of a Learning Objcect in LOM (see List. B.1, IKHarvester
can also provide the content of such LO. The content is supposed to be used in the
course created by an eLearning framework.
In the List. B.2, there is presented the content of a LO created out of information
harvested from a blog post available at: http://dobrzanski.net/2007/04/23/

ajax-activity-indicator/.
Listing B.2: The content of a Learning Object
<LO u r i =" h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 2 3 / a j a x − a c t i v i t y − i n d i c a t o r /">

<c o n t e n t > <![CDATA[ < h1>AJAX activity i n d i c a t o r </h1><d i v >
Users are familiar with indications of work performed
in background since first versions o f MS Windows .
Besides being fancy , they are also i n f o r m a t i v e . 
AJAX, a Web 2.0 technique , aim at exchanging only
small amounts of data with a server ; this should be
performed behind the scenes . If so , why not expose
the moments when user interaction brings about reqest
and response from a server ? Remeber my previous
<a h r e f =" h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 2 2 /
u s i n g −put −and− d e l e t e −methods −i n −a j a x − r e q u e s t a −
w i t h − p r o t o t y p e j s /"> p o s t </a> a b o u t using prototype . j s
for making AJAX r e q u e s t ? I use prototype also for indicating
background actions on web pages that s u p p o r t AJAX. 
You’d never guess how easy it is to such
i n d i c a t o r . F i r s t , you must register an action
which accurs in case of an AJAX− r e l a t e d event .
The best way to do that is add the following code
in t h e <c o d e>head </c o d e> s e c t i o n of t h e HTML c o d e
( remember to include prototype . j s library before i t ! ) : 
<c o d e >< s c r i p t t y p e =" t e x t / j a v a s c r i p t "><
![CDATA[ Ajax . R e s p o n d e r s . r e g i s t e r ( {
onCreate : function (){ E l e m e n t . show ( ' s p i n n e r ' ) } ,
onComplete : f u n c t i o n ( ) { Element . h i d e ( ' s p i n n e r ' ) } } ) ;
148
]]></ s c r i p t ></ c o d e >Then ,
further in the code i n s i d e <c o d e>body</c o d e> s e c t i o n ,
add t h i s : <c o d e ><img a l t =" s p i n n e r "
i d =" s p i n n e r " s r c =" g f x / s p i n n e r . g i f " s t y l e =" d i s p l a y : n o n e ; "
/></ c o d e >A c t u a l l y , i t ’ s all .
Whenever you click an object which sends an AJAX
request to the server , the indicator defined by
<c o d e>img</c o d e> a p p e a r s and is visible until
the response is o b t a i n e d . Wonder , how to create
an indicator animation ? Either download one from
<a h r e f =" h t t p : / /www . n a p y f a b . com/ a j a x − i n d i c a t o r s /"> h e r e </a>
or generate o n e <a h r e f =" h t t p : / / a j a x l o a d . i n f o /"> t h e r e </a>
</d i v >]] >
</ c o n t e n t >
</LO>
149
B.3 List of LOs example
IKHarvester can deliver to a LMS a list of informal LOs it stores (see Tab. 4.2.2 for
details on this functional requirement).
List. B.3 is an example such list.
Listing B.3: List of Learning Objects
<LOList>
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / F r i e n d _ o f _ a _ F r i e n d " />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / House " />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / S u b v e r s i o n _ ( s o f t w a r e ) " />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / Opera " />
<l o u r i =" h t t p : / / s c r u b s . w i k i a . com/ w i k i / Main_Page " />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / M i c r o s o f t " />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i /RDF/A" removed=" t r u e "/>
<l o u r i =" h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 1 0 / l a t e x − t a b l e s / " />
<l o u r i =" h t t p : / / l i b r a r y . d e r i . i e / r e s o u r c e / c a 1 9 9 1 0 " />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / Roman_Catholic_Church " />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i /HCard" />
<l o u r i =" h t t p : / / l i b r a r y . d e r i . i e / r e s o u r c e /8 b f 0 7 1 5 6 " />
<l o u r i =" h t t p : / / l i b r a r y . d e r i . i e / r e s o u r c e /wYc5hSU1" removed=" t r u e "/>
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i /HTML" />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / Machine " />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / B a n f f _ N a t i o n a l _ P a r k " />
<l o u r i =" h t t p : / / w i k i . c o r r i b . o r g / i n d e x . php /S3B/MBB/SOA" />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / h c a r d " removed=" t r u e "/>
<l o u r i =" h t t p : / / m i c r o f o r m a t s . o r g / w i k i / hatom " removed=" t r u e "/>
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / B o e i n g " />
<l o u r i =" h t t p : / / d o b r z a n s k i . n e t / 2 0 0 7 / 0 4 / 0 3 / f a c u l t y − r e s e a r c h −day /" />
<l o u r i =" h t t p : / / l i b r a r y . d e r i . i e / r e s o u r c e / D i i s D 2 R s " />
<l o u r i =" h t t p : / / en . w i k i p e d i a . o r g / w i k i / F i r e f o x " />
</LOList>
150

Social Semantic Information Sources For Elearning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Social Semantic Information Sources For Elearning

Uploaded by

Copyright:

Available Formats

Gda«sk University of Technology

Faculty of Electronics, Telecommunications

National University of Ireland, Galway

Department: Department of Computer Systems Architecture

(Katedra) (Katedra Architektury Systemów Komputerowych)

Student's name: Jarosªaw Dobrza«ski

Type of studies: Master's

(Rodzaj studiów) (Dzienne magisterskie)

Specialty: Informatics, Distributed Applications and Internet Systems

(Kierunek studiów) (Informatyka, Aplikacje Rozproszone i Systemy Internetowe)

Supervisor: prof. dr hab. in».Henryk Krawczyk, prof. zw. PG

Consultant: mgr in». Sebastian Ryszard Kruk

that supplies Didaskon with information described with this model.

1.2 Goal of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 History of eLearning . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.2 Learning Object . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.3 Defects of eLearning . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 The current Web . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.2 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.3 Semantic Web and eLearning . . . . . . . . . . . . . . . . . . 20

2.3 Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.3 Social network . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Semantic Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.1 Metadata development . . . . . . . . . . . . . . . . . . . . . . 25

3 Social Semantic Information Sources and eLearning 2.0 29

3.1.2 Semantic Wikis . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.1.3 Social Semantic Digital Library . . . . . . . . . . . . . . . . . 36

3.2 Model of Social Semantic Information Sources . . . . . . . . . . . . . 40

3.3 eLearning 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.1 Is there a place for semantics? . . . . . . . . . . . . . . . . . . 43

4 Informal Knowledge Harvester 47

4.1.1 Existing tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 System Requirement Specication . . . . . . . . . . . . . . . . . . . . 50

4.2.1 System scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.2 System requirements . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.3 System Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.3 System design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.3.1 Service-Oriented Architecture . . . . . . . . . . . . . . . . . . 69

4.3.2 System components . . . . . . . . . . . . . . . . . . . . . . . . 70

4.3.4 Extending IKHarvester . . . . . . . . . . . . . . . . . . . . . . 74

4.3.5 Attribute mapping rules . . . . . . . . . . . . . . . . . . . . . 77

5.2 Three-tier architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.3 IKHarvester main page . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.4 Environment and necessary tools . . . . . . . . . . . . . . . . . . . . 89

5.4.1 Implementation environment . . . . . . . . . . . . . . . . . . . 89

5.5 Main problems and solution details . . . . . . . . . . . . . . . . . . . 93

5.5.2 Invoking the data tier features . . . . . . . . . . . . . . . . . . 96

5.5.3 Extending IKHarvester . . . . . . . . . . . . . . . . . . . . . . 102

5.5.4 Adding data to the informal knowledge repository . . . . . . . 107

6.1.1 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.1.2 IKHarvester . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7 Streszczenie pracy w j¦zyku polskim 115

7.1.1 Denicja problemu . . . . . . . . . . . . . . . . . . . . . . . . 115

7.1.2 Cele . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.2 Podstawy teoretyczne . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.2.1 eLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.2.2 Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.2.3 Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.2.4 Semantic Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.3 Social Semantic Information Sources i eLeraning 2.0 . . . . . . . . . . 120

7.3.1 Przykªady Social Semantic Information Sources . . . . . . . . 120

4.2 System Requirement Specication . . . . . . . . . . . . . . . . . . . . 50

7.1.1 Denicja problemu . . . . . . . . . . . . . . . . . . . . . . . . 115

7.5.2 rodowisko i niezb¦dne narz¦dzia . . . . . . . . . . . . . . . . 126

A.3.2 Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

attended to with diligence.

Informal learning, also known as self-directed learning, is more natural, spon-

The 'e' in eLearning stands for experience.

often unordered and it can be dicult to nd anything.

sources of informal knowledge. Basing on some preconditions, Didaskon creates

documentation. Finally, Chapter 6 is lled with some conclusions related to my

changed between dierent LOs.

specications of the IEEE LOM data model.

• General general information about the LO as a whole

those who have aected it during its evolution

• MetaMetadata information about the metadata instance itself