You are on page 1of 8

Towards a Commercial Adoption of Linked Open Data for

Online Content Providers



Wolfgang Halb Alexander Stocker Harald Mayer

Helmut Mülner Ilir Ademi

JOANNEUM RESEARCH
Institute of Information Systems
Graz, Austria
firstname.lastname@joanneum.at

ABSTRACT for a new technology supporting their online editorial pro-


Descriptive case studies for a successful commercial adop- cess.
tion of Linked Data are still rare. Little has been said by The current business model of commercial content pro-
the Linked Data community concerning the business poten- viders, e.g. online newspapers, is comparatively transpar-
tial of Linked Data and the challenges lying ahead for all ent: Content providers aim to produce useful contents, pub-
enterprises, intending to use Linked Data in their profes- lish these contents on their portals and indirectly monetize
sional production environments. Motivated from this lack them by serving advertisements. Hence, revenues of online
of research and driven by the requirements of commercial providers largely depend on the number of users consum-
content providers, we present a first version of our proto- ing online content (their reach) and indirectly also on their
type, aiming to support professional online editors in pro- session length. The two most common advertising pricing
ducing valuable online content, enriched with (multimedia) models are the cost per impression (CPI) and the cost per
information from Linked Data sources. We also include a click (CPC) model. Advertisements are either served di-
preliminary evaluation of our prototype and a comparison rectly by the content provider or by third party services.
of the prototype against other systems for semantic content For a more detailed discussion on business models, cf. e.g.
enrichement. Though presenting ongoing research, this pa- [12] or [14].
per contributes to the current discussion on business aspects Valuable online content, which is suitable for moneta-
related to the provision or consumption of Linked Data. rization through advertisement, is in practice created by
a specialist, the online editor. An online editor usually
investigates upcoming topics, aggregates (online) content
Categories and Subject Descriptors from various sources, including user generated content taken
H.4.m [Information Systems]: Miscellaneous from blogs, or professional content from press agencies, and
merges all these small junks together shaping a fascinating
General Terms story, capable of drawing the attention of the user. Need-
less to say, online editors require a “good nose” for how to
Design, Theory, Economics, Performance create such content, which is preferably consumed by peo-
ple on the Web. Unlike any other role in the online content
Keywords industry, professional online editors depend and rely more
Linked Data, commercial adoption, business aspects on Web technology. Research has shown that people on the
web are rather scanning content than reading everything in
1. INTRODUCTION AND MOTIVATION detail which is contradictory to readers of classical newspa-
pers [7]. Though, very little is known about the needs of
This section provides a short introduction into the online people regarding nature, structure and presentation of on-
content industry and motivates the needs of online editors line content, and we may just imagine, what differs good
∗Alexander Stocker also works for Know-Center, Inf- online content from bad. In practice, the amount of rev-
feldgasse 21a, Graz, Austria enue generated from advertisements may serve as indicator,
determining quality and appropriateness of online content.
Human resources are always scarce, which implies that
they have to be utilized efficiently and effectively. From
Permission to make digital or hard copies of all or part of this work for talks with commercial online content providers we learned
personal or classroom use is granted without fee provided that copies are that professional online editors spend most of their time in-
not made or distributed for profit or commercial advantage and that copies vestigating interesting and suitable material, which may en-
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific rich their editorial content to become more fascinating and
permission and/or a fee. valuable to their audience and/or enables them to produce
I-SEMANTICS 2010, September 1-3, 2010 Graz, Austria editorial content quicker, enriched with multimedia objects
Copyright 2010 ACM 978-1-4503-0014-8/10/09 ...$10.00.
and hyperlinks to advanced material. The latter is, where organizational or information technology level) greatly re-
third party content will play an important role. Accurately duce efficiency and should be reduced, whereas cooperation
considering these aspects may increase the attention of ex- should instead be fostered wherever possible (cf. [3]).
isting content consumers, keeping them on site for a longer As a result of this siloing, content providers usually find
period as well as attracting new consumers, positively af- themselves again in at least one of the following scenarios:
fecting the amount of revenues gained from ads embedded
in the content. Anyway, to prove these hypotheses is not 1. They may operate more than one content portal, facing
the goal of our current paper. the need to better integrate and interlink their data
Our research problem may be outlined as follows: Profes- to achieve better accessibility, i.e. to allow enhanced
sional online editors are facing at least two business chal- search and retrieval across portals.
lenges: 2. They may want to integrate third party data to en-
• They need to be very efficient in their core business, rich their own editorial content with open structured
i.e. developing their editorial content and getting it data, choose open - instead of licensed - content to
published instantly on the Web. reduce costs, and develop valuable intelligent applica-
tions based on open data.
• They have to produce up to date (multimedia) con-
tent, particularly capable of drawing the attention of 3. They may provide their own content to be used by
humans. third parties, thereby achieving a significant increase
in visibility and reach, raising the general reusability of
We investigated current Web technology, foremost Linked
their own content, and leading to third party adoption
Data, to support online editors in creating professional on-
coming along with search-engine related benefits.
line content. The next section takes a detailed look into
our approach discussing the potentials of Linked Data for Figure 1 depicts how online content providers may use Linked
business and transforming them to our specific situation. (Open) Data and benefit from it.

2. TOWARDS A TECHNOLOGY-ORIENTED Data provider Data consumer Benefit for Enterprise


SOLUTION Increased visibility of own
content
In this section we will motivate Linked Data as a proper Enterprise Increased reusability of own
1
technology-oriented solution for business, capable of dealing (Portal) content due to consistent data
structure
with both introduced challenges. As a subtopic of the Se-
Search Engine Optimization
mantic Web, Linked Data, based on four simple rules, has
Enriching own content with open
gained much attention in research. In a nutshell, the vision structured data
of the Linked Data community is to first facilitate the gener- 2 Enterprise Utilizing free data instead of fee-
(Portal) based data
ation of semantically enriched Data (Linked Data) and as a
Develop intelligent application
result of this data-supply others will come and build intelli- based on open data
gent applications on top of it. Such a strategy is supposed to Increase accessability of own
be a very pragmatic solution for the well-known chicken-egg content
Enterprise Enterprise Allow structured exchange of
problem of the Semantic Web. 3
(Portal 1) (Portal X) data between isolated portals
We find that as a current Semantic Web technology, Linked Allow search and retrieval
Data is capable of generating benefits in at least three dif- across different domains

ferent business scenarios:


1. Enterprises may adopt Linked Data to interlink their
Figure 1: Three scenarios for the use of Linked Data.
own content, increasing its accessibility for humans
and machines.
From this it follows that adopting Linked Data may defini-
2. Enterprises may adopt Linked Data to integrate third tively result in a paradigm shift for the professional editing
party content into their own portals, as the Web may business: Online editors may benefit much from Linked Data
“know” more than they do. as this new technology will enable them to perform better
and to create more valuable content. Specifically, Linked
3. Enterprises may adopt Linked Data to prepare their Data delivers the following benefits to online editors:
own content for third party adoption, enhancing its
reusability and visibility. • With Linked Data it is easy to integrate own or third
These three general benefits will especially affect commer- party data to generate appropriate and up-to-date con-
cial content providers, as they are dealing with huge amounts tent on the one hand, and
of data usually stored in data silos. A data store is called a • there is already a plethora of different Linked Data
“data silo” if it is so tightly dependent on a specific environ- sources available, that provides information ranging
ment that it is impossible to reciprocally use and share infor- from geographical to statistical data which is ready
mation with other information management systems within for integration.
or across organizational boundaries (cf. [8]). The so-called
“silo effect” is also currently popular in the business and Whenever an enterprise is dealing with Linked data, the
organizational communities to describe a lack of communi- question arises, whether Linked Data has more value to the
cation and common goals between departments in an or- human end user, than “ordinary” data. The Linked Data
ganization. Experts agree that these silos (whether on an Value Chain as pictured in Figure 2 provides a lightweight
act as Raw Data Providers. Mentionable exceptions are the
BBC [9] and the New York Times [10]. Linked Data technol-
ogy will enable enterprises to act as Linked Data Providers,
providing Linked Data for third parties, and to act as Linked
Data Application providers, consuming data from third par-
ties and providing more valuable Human-Readable-Data on
top of it for the human enduser.
Based on the illustrated initial situation of commercial
content providers, the current business challenges of online
editors and the concept of the Linked Data Value Chain,
we developed a first prototype, serving professional content
providers.

3. PROTOTYPE
Motivated by the findings stated in Section 2 a prototype
has been developed that is intended to support editors at
online content providers but can also be used as a general-
Figure 2: The Linked Data Value Chain [11] purpose tool in other domains. It enriches editorial content
with further information from Linked Data sources and also
generates a Linked Data version of the editorial content. A
model, to conceptualize the business perspective of Linked distinguishing feature of the tool is that it supports mul-
Data and to make the value adding process to the data trans- tilingual content which poses additional challenges in the
parent. context of the Web of Data where English is still the pre-
The Linked Data Value chain [11] comprises three dif- dominant language. Currently the prototype is being tested
ferent concepts, Participating Entities, Linked Data Roles, at a cooperating content provider.
and Types of Data. Participating entities are persons, enter-
prises, associations, and research institutes owning at least 3.1 Use Case
one of the following roles: The primary use case of the developed prototype is to
• Raw Data Providers provide any kind of data. support the editorial process at online content providers.
In order to meet the needs of the industry we collaborated
• Linked Data Providers provide any kind of data in a closely with one of the major content providers in Austria.
Linked Data format. They consume Raw Data pro- The tool can be integrated in a provider’s content manage-
vided by Raw Data Providers and transform it into ment system and while an editor creates an article, the text
Linked Data. is analyzed for interesting terms and the tool automatically
suggests further information from the Web of Data. This
• Linked Data Application Providers provide Linked Data includes data from various Linked Data sources such as tex-
Applications. They consume Linked Data provided by tual content, links, images, or video. The editor instanta-
Linked Data Providers, process it within their appli- neously receives additional information about the article she
cations and transform it into Human-Readable-Data. is writing and can then decide which external content should
be included in the article. This manual decision step has
• End users are humans who (like to) consume Human- been introduced following feedback from editors and con-
Readable-Data, which is a human-readable presenta- tent providers as they prefer to have more control over the
tion of Linked Data provided by Linked Data Appli- published content. It further allows to improve content sug-
cation Providers. gestions based on previous selections and preferences. As an
added benefit the final article is enriched with more excit-
The Linked Data Value Chain distinguishes between three
ing content and provides the reader with further information
types of Data, Raw Data, Linked Data und Human Read-
without having to leave the provider’s pages. This increases
able Data. According to this concept, the value of the data
the attractiveness of the content provider with further po-
increases with every data transformation step and human
tential positive effects on visit durations, returning users, as
readable data has the highest value for the human enduser.
well as page and ad impressions leading to higher ad rev-
• Raw Data is any kind of structured or unstructured enues for the content provider.
data that has not yet been converted into Linked Data
3.2 Modules
• Linked Data is any data in a Resource Description The tool consists of three different modules that work
Framework (RDF)1 format interlinked with other RDF together as a unified solution that can be integrated into
data. an existing content management system as a service or can
be used as a stand-alone web application. The term ex-
• Human-Readable-Data is any kind of data which is traction and classification module identifies interesting and
prepared for the consumption of humans. relevant terms which act as an input for the Linked Data
With respect to the introduced Linked Data Value Chain, consumption & interlinking module where additional infor-
online content providers currently and almost exclusively mation from Linked Data sources is collected. The Linked
Data Provision module makes the discovered information
1
http://www.w3.org/RDF/ available as Linked Data to the public.
3.2.1 Term extraction and classification 1 < body about = " http :// test . joanneum . at / news / xyz " >
2 < h1 property = " dc : title " > Von Graz aus ... </ h1 >
The term extraction and classification module that has 3 ...
been developed is targeted at German-language content as 4 < div resource =
the cooperating content provider produces the majority of " http :// dbpedia . org / resource / Larnaca "
its content in German. Even though there exist several term rel = " dcterms : relation " >
5 < span property = " rdfs : label "
extraction solutions for the English language (which can also xml : lang = " de " > Larnaka </ span >
be plugged in and used in the prototype) there is still a 6 < span property = " dbpprop : abstract "
lack of well-performing, generally available tools for German. xml : lang = " de " > Larnaka , auch Larnaca ,
griechisch Larnaka ... </ span >
Through its modular structure the prototype can support 7 ...
term extraction solutions for any language that are available 8 </ div >
from third-party providers. 9 < div
resource = " http :// sws . geonames . org /146400/ "
In the current stage the term extraction and classification rel = " dcterms : relation " >
module combines the following approaches: 10 ...
11 </ div >
• Blacklist: All terms and possible word formations
that occur in a general dictionary are removed from
the text. As a result this step delivers “interesting” Listing 1: RDFa snippets of an exemplary Linked
words such as names, toponyms, and domain-specific Data output
terms. All possible morphological formations have to
be considered and compounds are also a special chal-
lenge in the German language that may lead to false Data sources such as SILK [17], the Co-reference Resolution
positives and negatives. A detailed discussion of the System (CRS) [5], KnoFuss [13], or LD-Mapper [16] cannot
approach is out of scope of this paper. be used in this context. These approaches rely on further
information about a resource that can be compared in two
• Whitelist: Further a whitelist with relevant terms different datasets, i.e. there needs to be some overlap be-
from Linked Data sources is applied, with the most tween the source and target datasets which is not the case
important datasets being DBpedia2 and GeoNames3 in our scenario where the source dataset contains no more
for general news articles. information about a concept than its label.
Information that is available and can be exploited is the
• Rules: With a rule-based approach also person and co-occurrence of terms in one article. By looking at all po-
company names are identified (e.g. first name followed tential concepts for all extracted terms contained in one doc-
by noun → person name). ument it is possible to construct what we call a Linked Data
Entity Space. All relations between the concepts are mod-
3.2.2 Linked Data consumption & interlinking eled which allows to identify concepts that are most closely
The Linked Data consumption & interlinking module re- related, i.e. that have the shortest conceptual distance. For
trieves additional information about the identified terms and toponyms geographic distance metrics are also taken into
creates interlinks to Linked Data sources. Disambiguation account. Toponym disambiguation based on geographic dis-
between different concepts that share the same label is one tances is extremely powerful for certain categories such as
of the biggest challenges. The term “Obama” can for in- local news.
stance refer to the American president, his wife, or a city in Once the concept has been identified it is possible to re-
Japan. trieve further information from different data sources. The
The general workflow of the module starts with a search information gained from Linked Data sources can also be
for potentially relevant information in external data sources. extremely useful for queries to other repositories containing
In the first phase this is done via a query to DBpedia and for instance user contributed multimedia content. It is pos-
GeoNames, two datasets that already contain a lot of general sible to find more appropriate matches as the query can be
knowledge and geographic information. Even though the enhanced with more metadata for finding relevant images
use of further datasets is easily possible, this first step aims and videos. Especially for concepts related to geographical
at identifying relevant concepts for the found terms where locations it is possible to supply coordinates that have been
these two Linked Data hubs already provide sufficient infor- retrieved from DBpedia or GeoNames in the query when
mation for general news articles. The query is based on the searching for images on Flickr or videos on Youtube.
extracted term and takes the original document’s language
into account as well as textual similarity measures. 3.2.3 Linked Data provision
In many cases it is not possible to find only one distinct The final article contains related information from Linked
concept but rather many potential matches are found and Data sources as well as image and video content from Flickr4
thus disambiguation needs to be done. In the current use and Youtube5 . It also includes an RDF representation that
case there is almost no information about the term avail- is embedded in the webpage via RDFa [1], a practice for
able directly as it is simply part of a news article. The building linked data for both humans and machines as de-
information about the category of the article (e.g. politics, scribed in [6]. Some exemplary snippets of this output are
local news, etc.) can aid in narrowing potential concepts shown in Listing 1 where a machine processable descrip-
though. Subsequently this implies that link discovery and tion of the article, its relations with further information
resolution approaches that have been proposed for Linked sources and additional information is provided along with
2 4
http://dbpedia.org/About http://www.flickr.com/
3 5
http://www.geonames.org/ http://www.youtube.com/
the human-targeted output as shown in Figure 3. The re- 100%
cently added support of RDFa content by important global 90%
search engine providers such as Yahoo!6 and Google7 [2]
80%
underlines the industry’s appreciation of this approach. For
70%
further processing by applications an RDF/XML version is
also provided. 60%
50% Our prototype
3.3 In Use 40% Zemanta
The prototype can be used in a standalone web applica- 30%
tion or it can be integrated in a content management system. 20%
In Figure 3 a screenshot of the demonstrator web applica- 10%
tion is shown. In the text editor area the original text is 0%
inserted, the panel on the right side then automatically sug- Recall Precision
gests further information from Linked Data sources that the
editor can simply select with a single click. The information
will be integrated in the final online news article and RDF
describing the news article as well as links to other resources Figure 4: Recall and precision for all terms
is also supplied.

4. EVALUATION preliminary evaluation contained only ten articles, in-


cluding articles in the domains of politics, business,
Although our intention was to investigate paradigms of sports, and culture. For a more comprehensive future
the commercial adoption of Linked Data, and even though evaluation we have access to more than 3,000 articles
our prototype is still under development, we have conducted from the news paper.
a preliminary evaluation and included a benchmark against
other systems for semantic content enrichement. These early 2. We manually extracted persons, places, and organi-
tests already indicate that the achieved performance of our zations from each article. Furthermore, we extracted
our prototype is superior compared to similar systems that terms which are important for a particular article as
aim at enhancing (editorial) content. The most distinguish- they describe the content.
ing features of our solution are the support of the German
language, recommendation of multimedia content and the 3. Thereafter, we processed the content of each article
relationship of the recommended objects to concepts ex- with our prototype as well as with Zemanta. This
tracted from the text. resulted in an automatic extraction of terms which we
One goal of our evalation was to benchmark our proto- evaluate in the next step.
type against three popular applications for semantic enrich-
4. We did a quantitative comparison between manually
ment as listed in Table 1 where demonstrators are publicly
and automatically extracted terms for both applica-
available on the Web: Calais8 , Ontos API9 , and Zemanta10 .
tions, our prototype and Zemanta.
However, as can be seen from the comparison table, some
tools have restrictions which resulted in a limited evalua- 5. We manually analyzed quantity and quality of the con-
tion: We found out that Calais currently does not support tent recommended from both applications. Thereby,
German-language content at all. Using a translation service we especially wanted to explore whether and how rec-
as an intermediary step and processing the translated output ommended objects relate to the terms (concepts) ex-
- an English version of the German-language article - might tracted from the articles.
at the first glance solve the language problem. As this would
indirectly result in an evaluation of the translation service 6. We compared the recommendation-ability of our pro-
within our evaluation, we decided not to include Calais at all totype to the results of Zemanta. As the ability to
as its functionality does not match with our requirements. suggest content for enrichement differs in various as-
We furthermore learned that Ontos API is currently in fact pects, this evaluation is mainly done qualitatively.
capable of extracting terms, but does not provide any con-
In the following, we present the lessons learned from our
tent enrichment which resulted in a drop-out of Ontos API,
preliminary evaluation. Figures 4 and 5 present the quanti-
too. As only Zemanta provides all required functionality for
tative results of the term extraction evaluation (step 1 - 4):
a valid comparison, we are limited to benchmark our proto-
From our investigation we learned that our prototype gen-
type with Zemanta.
erated satisfactorily results. The recall of all relevant terms
We developed a six-step procedure for our preliminary
(cf. Figure 4) is considerably higher in our prototype and
evaluation:
even though Zemanta has identified less relevant terms the
1. In the first step we selected a set of German-language precision values are comparable. This shows that our pro-
articles from a local commercial news provider. Our totype is capable of retrieving more correct relevant terms
which allows online editors to be more flexible when enrich-
6
http://www.yahoo.com ing their content with content from third party sources.
7
http://www.google.com Figure 5 shows the recall values for places, people, and
8
http://viewer.opencalais.com/ organizations. Our prototype outperforms Zemanta when
9
http://try.ontos.com/ extracting people and places: In this context it is important
10
http://www.zemanta.com/demo/ to mention that Zemanta can extract internationally known
Semantic Text Analysis and Interlinking
Article Recommandations:
SemTex Editor Place (8)

Font Name and Size Font Style Undo/Redo Berlin

Arial 13 Bonn

Alignment Paragraph Style Indenting and Lists Insert Item Graz


Normal Larnaca

Wikipedia

Von Graz aus zu über 60 Destinationen Name: Larnaka


Beschreibung:Larnaka, auch Larnaca, griechisch Larnaka
Der Flughafen Graz stellte am Mittwoch seinen Sommerflugplan vor. Mit dabei einige Neuheiten. (Λάρνακα, veraltet Λάρναξ), türkisch Larnaka, ist eine
Hafenstadt im Südosten der Mittelmeerinsel Zypern mit
Am 28 03.2010 tritt der Sommerflugplan des Flughafen Graz in Kraft. Zirka 60 Destinationen in
rund 20 Ländern stehen auf dem Programm, darunter auch einige Neuheiten. Ende Mai bis 77.000 Einwohnern (Stand 31. Dezember 2004) und
Anfang September kann man mit Blagus und Niki nach Irland (Shannon) fliegen. Auch Zypern Hauptort des gleichnamigen Bezirkes. [siehe Wikipedia]
steht heuer auf dem Programm: Moser-Reisen bietet Flüge nach Paphos. Ein weiteres Ziel auf Referenzen:
Zypern ist in diesem Sommer natürlich auch wieder Larnaca. Schon seit einiger Zeit als
http //www.larnaka.com
Dreiecksflug im Programm – jetzt geht es mit dem Reiseveranstalter ETI auch direkt von Graz
ins ägyptische Sharm el Sheikh am Roten Meer. Auch Städtereisende kommen im Sommer auf Geonames
ihre Kosten: Ab 1. Mai fliegt Air Berlin fünf Mal pro Woche (täglich außer Dienstag und
Donnerstag) von Graz zum Flughafen Berlin/Tegel. Flickr

body < p#MailActi... You Tube

Analyze London Stansted

Mallorca

Paphos

Stansted
Von Graz aus zu über 60 Destinationen Organisation (1)
Der Flughafen Graz stellte am Mittwoch seinen Sommerflugplan vor. Mit dabei einige Neuheiten.
Ryanair
Am 28.03.2010 tritt der Sommerflugplan des Flughafen Graz in Kraft. Zirka 60 Destinationen in
rund 20 Ländern stehen auf dem Programm, darunter auch einige Neuheiten. Ende Mai bis Wikipedia
Anfang September kann man mit Blagus und Niki nach Irland (Shannon) fliegen. Auch Zypern
Flickr
steht heuer auf dem Programm: Moser-Reisen bietet Flüge nach Paphos. Ein weiteres Ziel auf
Zypern ist in diesem Sommer natürlich auch wieder Larnaca. Schon seit einiger Zeit als
Dreiecksflug im Programm – jetzt geht es mit dem Reiseveranstalter ETI auch direkt von Graz
ins ägyptische Sharm el Sheikh am Roten Meer. Auch Städtereisende kommen im Sommer auf
ihre Kosten: Ab 1. Mai fliegt Air Berlin fünf Mal pro Woche (täglich außer Dienstag und
Donnerstag) von Graz zum Flughafen Berlin/Tegel.
Mit Beginn des Sommerflugplans übernimmt außerdem die Welcome Air die Destination
Köln/Bonn von der Air Berlin. Die Flugtage sind Dienstag, Donnerstag und Sonntag. Weiterhin
im Programm ist auch der London-Stansted-Flug der Ryanair. In Großbritanniens Hauptstadt
geht es immer Montag, Mittwoch, Freitag und Sonntag. Und was wäre ein Sommer ohne Flüge
nach Palma de Mallorca: Niki fliegt sechs Mal in die Urlaubshochburg.

Figure 3: Screenshot of the prototype

English language German language Categorizations Integration of external content


support support
Our prototype via extensions yes yes yes (Linked Data, multimedia content)
Calais yes - yes -
Zemanta yes limited - yes (select sources)
Ontos API yes yes yes -

Table 1: Comparison of different systems for semantic enrichment


 

100% the Web without having to care about own infrastructure


the downside is that one becomes dependent on the data
90%
providers’ infrastructure. Response times for queries depend
80% on server load and it could happen that a resource is unavail-
70% able for any (technical) reason that cannot be influenced by
60%
the data consumer. One of the solutions to this issue is host-
ing a copy of relevant Linked Data on infrastructure that is
50% Our prototype under one’s own control - which in turn creates the need for
40% Zemanta getting informed about dataset changes where different ap-
30%
proaches exist and a commonly agreed strategy still needs to
evolve12 . From a business perspective one could also identify
20%
the need for Linked Data providers/consolidators that offer
10% Service Level Agreements that contractually guarantee the
0% availability of relevant Linked Data for professional users.
Places People Organizations Data quality: For professional users the quality of data
can be an important issue that should be taken into account
by data publishers if they want their data to be reused.
Legal issues: Dataset publishers should be explicit about
Figure 5: Recall of entities by category licensing terms of their data to protect their rights or make
the data legally safely useable by others (cf. [4]).
Even though further challenges in the context of Linked
persons very well, but struggles when trying to extract local Data (such as provenance, trust, archiving, etc.) would be
politicians or businessmen. When it comes to extracting worthwhile dealing with it seems that solving the issues pre-
organizations, our prototype is on a comparable level with sented above could foster an even greater acceptance of so-
Zemanta. lutions based on Linked Data for professional use.
Evaluating the ability of our prototype to suggest text,
pictures and videos for content enrichement (step 5 - 6) is 6. CONCLUSIONS AND FUTURE WORK
more challgenging, especially as it differs from those of Ze-
From recent discussions with online editors we learned
manta in various aspects: While our prototype links suitable
much about the nature of their business, quick but profes-
data for any of the selected terms using Wikipedia, Geon-
sional production of valuable and up-to-date online content
ames, Flickr, and Youtube, Zemanta only recommends con-
as outlined in Section 1. We drove our motivation to in-
tent for the entire article. Therefore, data recommended by
vestigate the adoption of Linked Data for commercial con-
our prototype is of finer granularity and of higher seman-
tent providers from the requirements of online editors. The
tics. This aspect makes it almost impossible to compare the
business opportunities underlying Linked Data may fulfill
quality of content linked by our prototype against the con-
their requirements as discussed in Section 2. Our prototype,
tent Zemanta recommends. From the requirements of online
which we comprehensively presented in Section 3, is aimed at
editors, we learned that content should preferably be linked
supporting online editors during their editorial process and
to the terms (concepts) in the text and not to the entire
beyond. Results of a preliminary evaluation are presented
article. Evaluating the quantity of content linked, we found
in Section 4 and showed that our prototype is already able
out that our prototype suggests a plethora of media and
to outperform other solutions. When implementing our first
text objects compared to Zemanta. However, both quantity
prototype, we learned much about challenges impairing a
and quality of content can be further improved: We noticed,
successful adoption of Linked Data for professional produc-
that mainly pictures from Flickr and videos from Youtube
tion environments, which we briefly outlined in Section 5.
did not always match with extracted concepts.
Our paper already includes a preliminary evaluation of
our prototype as well as a comparison against other systems
5. CHALLENGES for semantic content enrichement. In a next step, we will
Over the past months Linked Data technology has ma- rigorously evaluate our first prototype with professional on-
tured, various applications have been created, and new data line editors as probands, deriving further requirements for
is constantly being added. However, during the development technical improvement. Second, we aim to integrate the
of the prototype we have identified three important chal- next version of our prototype into the content manageme-
lenges that still need attention and should be addressed to ment system of a major Austrian content provider enabling
make Linked Data applications competitive and allow their us to perform a test of the tentative hypothesis as shown
use in professional production environments. It has to be in Figure 6 that embedding Linked Data positively affects
noted that most of the issues do not only apply to Linked the number of page visits and session length and thus has a
Data but the Web in general. positive effect on content providers’ revenues.
Distributed infrastructure: One of the advantages of
Linked Data is that it can be accessed via HTTP URIs and 7. ACKNOWLEDGMENTS
queries on remote SPARQL [15] endpoints that are made The research reported in this paper was partially sup-
available by the data providers themselves or Linked Data ported by the Austrian Federal Ministry for Transport, In-
consolidators (such as the Virtuoso instance hosting Linked novation and Technology (bmvit).
Open Data11 ). Even though it is appealing to use data from
12
see http://esw.w3.org/topic/DatasetDynamic for a list
11
http://lod.openlinksw.com/ of proposals to describe Linked Data updates and changes
[12] K. Lyons, C. Playford, P. R. Messinger, R. H. Niu,
Enriched Content
and E. Stroulia. Business models in emerging online
services. In M. L. Nelson, M. J. Shaw, and T. J.
+ + Strader, editors, Value Creation in E-Business
Management. 15th Americas Conference on
Page Visits Session Length Information Systems, AMCIS 2009, SIGeBIZ track.
Selected Papers, volume 36 of Lecture Notes in
Business Information Processing, pages 44–55, San
+ + Francisco, CA, USA, 2009. Springer Berlin Heidelberg.
Revenues
[13] A. Nikolov, V. Uren, E. Motta, and A. de Roeck.
Integration of semantically annotated data by the
KnoFuss architecture. In 16th International
Conference on Knowledge Engineering and Knowledge
Management (EKAW 2008), Acitrezza, Italy, 2008.
Figure 6: Tentative hypothesis of positive Linked
[14] A. Osterwalder and Y. Pigneur. An e-business model
Data effects
ontology for modeling e-business. In Proceedings of
15th Bled Electronic Commerce Conference. e-Reality:
8. REFERENCES Constructing the e-Economy, Bled, Slovenia, 2002.
[1] B. Adida, M. Birbeck, S. McCarron, and [15] E. Prud’hommeaux and A. Seaborne. SPARQL query
S. Pemberton. RDFa in XHTML: Syntax and language for RDF. W3C recommendation.
processing. W3C recommendation. http://www.w3.org/TR/rdf-sparql-query/, Jan
http://www.w3.org/TR/rdfa-syntax/, Oct 2008. 2008.
[2] D. Connolly. Search engines take on structured data. [16] Y. Raimond, C. Sutton, and M. Sandler. Automatic
W3C blog. http://www.w3.org/QA/2009/05/ interlinking of music datasets on the semantic web. In
structured_data_and_search_eng.html, May 2009. Proceedings of the Linked Data on the Web Workshop
[3] M. Côté. A matter of trust and respect. CA magazine, (LDOW2008), Beijing, China, 2008.
March, 2002. [17] J. Volz, C. Bizer, M. Gaedke, and G. Kobilarov. Silk –
[4] I. Davis. Linked data and the public domain. a link discovery framework for the web of data. In
http://blogs.talis.com/nodalities/2009/07/ Proceedings of the Linked Data on the Web Workshop
linked-data-public-domain.php, July 2009. (LDOW2009), Madrid, Spain, 2009.
[5] H. Glaser, A. Jaffri, and I. C. Millard. Managing
co-reference on the semantic web. In Proceedings of
the Linked Data on the Web Workshop (LDOW2009),
Madrid, Spain, 2009.
[6] W. Halb, Y. Raimond, and M. Hausenblas. Building
linked data for both humans and machines. In
Proceedings of the Linked Data on the Web Workshop
(LDOW2008), Beijing, China, 2008.
[7] K. Holmqvist, J. Holsanova, M. Barthelson, and
D. Lundqvist. Reading or scanning? A study of
newspaper and net paper reading. Elsevier Science
Ltd., 2003.
[8] J. Hurwitz, C. Baroud, R. Bloor, and M. Kaufman.
Service Oriented Architecture for Dummies, chapter
13: Where’s the data?, pages 153–166. Wiley
Publishing Inc., Hoboken, NJ, USA, 2007.
[9] G. Kobilarov, T. Scott, Y. Raimond, S. Oliver,
C. Sizemore, M. Smethurst, and R. Lee. Media meets
semantic web - how the BBC uses DBpedia and linked
data to make connections. In European Semantic Web
Conference, Semantic Web in Use Track, 2009.
[10] R. Larson and E. Sandhaus. NYT to release thesaurus
and enter linked data cloud.
http://open.blogs.nytimes.com/2009/06/26/nyt-to-
release-thesaurus-and-enter-linked-data-cloud/, June
2009.
[11] A. Latif, A. U. Saeed, P. Hoefler, A. Stocker, and
C. Wagner. The linked data value chain: A lightweight
model for business engineers. In Proceedings of
I-SEMANTICS ’09 International Conference on
Semantic Systems, pages 568–575, Graz, Austria,
2009.

You might also like