You are on page 1of 14

JID: ESWA

ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

Expert Systems With Applications 0 0 0 (2016) 1–14

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

A recommendation approach for consuming linked open data


Jonice Oliveira1,∗, Carla Delgado, Ana Carolina Assaife
Universidade Federal do Rio de Janeiro, Brazil

a r t i c l e i n f o a b s t r a c t

Article history: Most of linked open data (LOD) applications focus on the search and visualization of information, not ef-
Received 6 April 2016 ficiently using the links among objects in different data sources and the semantics of their relations. This
Revised 16 October 2016
work aims to create a LOD-consuming approach that uses recommendation techniques based on items’
Accepted 16 October 2016
description, their relations, users’ interests and social network. The proposed approach was instantiated
Available online xxx
by an application that uses movie related LOD. The results obtained in our experiments were promis-
Keywords: ing: accuracy of the recommendations generated was equal or better, compared to other recommender
Information retrieval algorithms used in conventional (not LOD) scenario.
Linked open data © 2016 Elsevier Ltd. All rights reserved.
Recommender systems

1. Introduction Identifier) (Berners-Lee, Fielding, & Masinter, 1998). What makes


LOD particularly interesting is the huge volume of structured data
The World Wide Web (WWW, or Web for short) changed the available, which can be consumed by humans or artificial agents
way people access and share knowledge (Berners-Lee, Cailliau, (computer programs) and also, the links among items of different
Groff, & Pollermann, 1992). Barriers in document publication and sorts, as for example, links among person data bases to movies’
sharing have been considerably reduced, causing the global infor- data bases.
mation space to grow. In spite of the efforts to improve data con- With this mobilization from the community and with the grow-
sumption, the data published in the Web still form knowledge is- ing volume of published data, research was conducted and appli-
lands. Also, document search is still based on keywords, hetero- cations developed to improve the means to consume and explore
geneous data models and data types, and no pattern to resource the Web of Data. According to Bizer et al. (2009), the main advan-
identification has been established. As documents were primarily tage that linked data brings to the user is the possibility to access
made to be understood by humans, document integration used to related or combined data from distributed and even heterogeneous
be a manual process. This makes the computational processing and data sources. Also, according to the same authors, using the Web
information inference on published documents very difficult. as a unique global database has a couple of challenges, as for ex-
Because of the difficulties in automatically extracting mean- ample the construction of applications that ease the user interac-
ing from documents, the Web evolved to the so called “Web of tion with the data. This way, new contributions to the fields of data
Data”, incorporating mechanisms for a better semantic representa- navigation, data search and data consumption is welcome.
tion and integration of Web publications (Bizer, Heath, & Berners- The support for users that consume information published as
Lee, 2009). The concept of Linked Open Data (LOD) establishes the LOD is still very limited: such applications are in majority focused
principles for data publication and data relation, and motivates the in the search and visualization of the retrieved information. They
creation of links with associated meaning among data from sev- are able to show related vocabulary, exhibit the data in facets, but
eral sources (Bizer, Cyganiak, & Heath, 2007). In LOD, the infor- they do not suggest related information to catch the user inter-
mation is represented in RDF (Resource Description Framework) est (Franz, Koch, Dividino, & Staab, 2010). It is still a challenge to
triple graphs (Klyne & Carroll, 2004) where world elements are make the consumption of LOD attractive to the end user. Our goal
represented by unique identifiers called URIs (Uniform Resource is to tackle the problem of easing the consumption of LOD to the
end user, as well as support the construction of applications that
would motivate the user to consume LOD. We believe that this is

Corresponding author.
an important step towards using the web as a unique global data
E-mail addresses: jonice@dcc.ufrj.br, jonice@gmail.com (J. Oliveira),
carla@dcc.ufrj.br (C. Delgado), ana.assaife@ppgi.ufrj.br (A.C. Assaife).
repository to serve all sorts of end users’ applications.
1
Authors would like to thank CNPq, CAPES and FAPERJ for partially supporting After studying the related works on the field of end users’
this research work. LOD consumption and navigation, we noticed that recommenda-

http://dx.doi.org/10.1016/j.eswa.2016.10.037
0957-4174/© 2016 Elsevier Ltd. All rights reserved.

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

2 J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14

tion mechanisms are applied to ease and motivate the consump- the abundance of links among the resources and do not consider
tion of such data (Heitmann & Hayes, 2010; Di Noia, Cantador, & the ontology hierarchy describing this data. Passant conducted an
Ostuni, 2014; Musto, Lops, Basile, de Gemmis, & Semeraro, 2016, to experiment to decide which of the distances (direct, indirect, bal-
cite some). In this scenario of recommendation systems using LOD, anced and combined distances) would be more appropriate for a
previous works used to combine LOD with other closed systems, as recommender system. The combined balanced distance was the
for instance Facebook, MySpace and LastFM (Mirizzi, Noia, Ragone, one performing best.
Ostuni, & Sciascio, 2012; Heitmann & Hayes, 2010; Passant, 2010), Later, the same author (Passant & Decker, 2010; Passant, 2010)
envisioning the improvement of the recommendations by the use applied the created methodology to develop a system to recom-
of more structured and semantically richer data about the user. mend songs called dbrec (Passant, 2010). The output of this system
The improvement of recommendations using only LOD data – on was validated by making a comparison to Last.fm, a music recom-
a way to work only with open data on a self-sustainable LOD uni- mender system already established. As we will see later, the main
verse, and not requiring user information – was an open problem. difference to our work is the restriction of the items to be com-
The presented approach is based on this challenge: working with pared. In (Passant, 2010) just resources that have the properties
the recommendation of LOD using one or more datasets – and only dbpedia:Band and/or dbpedia:MusicalArtist are taken into account.
LOD datasets – enabling the end users to navigate, by the recom- In our approach, broader criteria are used (correlated properties,
mendations, in LOD. similarities in descriptions for a resource). Another aspect observed
This paper describes a recommendation approach to LOD, using in this related work is that it just uses one database, thus not ben-
one or more datasets. This is the main contribution, considering efiting from the links found in several bases and also DBPedia.
that previous works are limited to the use of only one dataset. The Still in the music domain, the Seevl application is worth men-
approach here presented is based on three recommendation types: tioning (Passant, 2011). It is a Chrome plugin that provides mu-
sic recommendations, biographies and complementary information
(1) Recommendation of resources of the same type of the input re-
to youtube videos. Once extracted and converted to RDF, the data
source, considering that the input resource is a resource that
about artists is translated into a common vocabulary using the Mu-
interests the user, or that the user likes;
sic Ontology Raimond et al. (2007), forming a graph of musical en-
(2) Recommendation of social networks, that recommends people
tities like artists, bands, genres, labels, and others. Seevl also pro-
from the same domain as the input resource;
vides combined search of properties, making it possible to search
(3) Recommendation of resources of any type that are related to
“all artists and bands of rock genre that played in a particular fes-
the input resource, using the textual description of the re-
tival”. In spite of presenting the used architecture and the steps
sources to find related items. This empowers the identification
in the process, this work did not bring any information about the
of related resources from different types. For instance, when
implementation of the recommendation algorithms used. But as it
the user searches for a movie, it is possible to recommend peo-
is from the same author of the previous works described in this
ple, books, soundtracks or any other resource that the algo-
section, and using data from the same domain, we believe that
rithm considers similar to the movie given as start point for
Seevl is a mature application developed after other more theoreti-
the search.
cal works from the same author.
As we said previously, our goal is to tackle the problem of eas- Also in the field of media files, MORE (Mirizzi et al., 2012) is
ing the consumption of LOD to the end user. Consequently, we an application developed to complement Facebook by recommend-
would like to analyze how our approach could be feasible and ef- ing movies. MORE uses the data from bases of triplets in the do-
ficient from the user’s point-of-view. An experiment, divided in main, and compute movie similarities. The special thing about this
two stages, was conducted to validate this work. Firstly, we con- application is that besides looking at the content, it also consid-
duct a quasi-experiment, aimed at evaluating the end user percep- ers the user’s Facebook profile. By using as input information the
tion of the recommendation technique. In this stage, a group used movies that the user marked as interesting in his or her Face-
our system and we collected information about the execution and book profile, MORE intends to overcome the “cold start” problem,
their perception. These results were used in the second phase of i.e. having some information even about users that have not yet
the evaluation, which was a quantitative off-line analysis. In this rated any movie. The main contribution of this work is that it pro-
stage, we measured a set of recommendation metrics for the pro- vides a semantic recommendation algorithm, called Semantic Vec-
posed algorithms (RICCI et al., 2011), such as: accuracy, prediction tor Space Model. The algorithm is an adaptation from the Vector
accuracy by ranking, novelty and utility. All the analysis based on Space Model (Salton, Wong, & Yang, 1975). In spite of exploring
the user’s feedback and the four metrics were positive and, con- the same domain as our work (movies), this work is not strongly
sequently, we could affirm that this approach brings contributions related to ours as it was not our goal to use user profile informa-
for Consuming Linked Open Data scenario. tion, as this information is not always available. Our goal was to
This paper is organized as follows. Section 2 presents the works generate recommendations from RDF databases.
related to the use of LOD in recommendations. Our proposal is Other project worth mentioning is RKBExplorer.com (Glaser,
detailed in section 3. In Section 4 our proposal is applied to the Millard, & Jaffri, 2008). It is a semantic web application that aims
movie scenario, and the validation experiments are described in to present unified versions from a considerable number of differ-
Section 5, as well as the results obtained. Conclusions and future ent data sources. Up to now, the application works in the domain
work come in Section 6. of people and publications in Computer Science. The data about
people, publications and institutions is in RDF, and have been pro-
2. Related work vided by the project partners in ReSIST (Glaser & Millard, 2007).
This application provides an interface where the user specifies two
Considerations on how to take advantage of LOD have been resources of his or her interest and the type of the relation be-
raised since real LOD bases became available (Heath, 2008). Due tween them, and gets as an output the properties that link these
the absence of works in the area, Passant (2010) proposed a set resources. This application requires that the user knows both the
of measures to compute the semantic distance in LOD in or- input resources and the type of each resource. This is a limitation
der to measure similarity between two resources, and discussed in the sense that there is no way to surprise the user recommend-
how such measures could be used in applications, such as re- ing novel items. In our approach the user does not need to know
sources’ recommendations. The measures proposed were based on anything about the resources nor the datasets.

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14 3

Fig. 1. Architecture of the proposed recommendation system.

Heitmann & Hayes, 2010 proposed an architecture for open rec- recommendations use an ensemble of several different recommen-
ommender systems based on collaborative filtering. Their work fo- dation methods, while post-processing results are very effective in
cuses on using collaborative filtering to exploit simple, binary con- increasing the diversity of the recommendation list. Also, we be-
nections between users and items. They report very good results lieve that our work goes beyond the tasks proposed, as we explore
on using LOD bases to mitigate the “cold start” problem, a known the recommendation of items that are not only movies, but belong
problem of collaborative filtering recommenders which means that to the universe of movies. This way, we invest on improving met-
for users that did not yet rate many things, or items that were not rics such as diversity, novelty and serendipity.
yet sufficiently rated, it is very hard to produce recommendations. Piao and Breslin (2016) proposed various distance measures, on
Basically the LOD resources are used to solve the data acquisition top of the basic concept of Linked Data Semantic Distance (LDSD)
problem, filling in the gaps when user data is missing for a spe- (Passant, 2010), for calculating Linked Data semantic distance be-
cific domain. But different from our approach, the LOD bases are tween resources that can be used in a LOD-enabled recommender
not the source of the items being recommended. system. Their results show that the performance can be signif-
Ostuni, Di Noia, Di Sciascio, & Mirizzi, 2013 present SPrank (Se- icantly improved beyond the usage of LDSD based metrics. We
mantic Path-based ranking), a hybrid recommendation algorithm did not use Piao and Breslin (2016) measures in our system, as
to compute top-N item recommendations from implicit feedback, we wanted to stick to the basics, so we could better compare it
that combines ontological knowledge extracted from DBpedia with to other systems. But this opens possibilities for further improve-
collaborative user preferences in a graph-based setting. Sprank’s ments that we plan to investigate.
strength is to address the top-N item recommendation task and Musto et al. (2016) reports the implementation of a graph-
at the same time, deal with implicit feedback datasets. The main based recommendation methodology based on the personalized
idea is exploring paths in a semantic graph in order to find items PageRank algorithm. The graph used represents both user and
that are related in some sense to the ones the user is interested item’s as nodes from a bipartite graph, and the edges between
in. From the analysis of these paths, path-based features (extracted users’ nodes and items’ nodes represent the feedback information
from DBpedia) are extracted and a learning algorithm is applied to on how the item interests the user. This information is used to im-
obtain a ranking function able to recommend the most relevant plement the collaborative filtering approach, which is not the goal
items to the user. The whole proposal of Sprank is very different of our work. But the proposal from Musto et al. (2016) and ours
from what we are proposing, but two meaningful differences from have in common the usage of LOD. The results presented in Musto
our system to Sprank is that it uses just one source of LOD (DB- et al. (2016) confirm that knowledge coming from the LOD cloud
pedia), and that it sticks to recommending items of the same type can positively impact on the recommendation algorithm.
(in the paper, they mention recommendation of either movies or
songs, but not a combination of both). 3. A model for recommendations using linked open data
In 2014, the ESWC 2014 Challenge on Linked Open Data-
enabled Recommender Systems was launched. The primary goal of According to Burke (2002), traditional recommender systems’
the Challenge was both to create a link between the Semantic Web architecture has three basic components: i) the data provided by
and the Recommender Systems communities and also to show the user (for which the system should find something related),
how Linked Open Data and semantic technologies can boost the ii) the data the system already has about other items or other
creation of a new breed of knowledge-enabled and content-based users or both and iii) the recommendation algorithm. As pointed
recommender systems (Di Noia et al., 2014). The challenge focused by Heitmann and Hayes (2010), when using LOD as input data, this
on the particular scenario of book recommendation, and had three general architecture should be extended with the creation of an
tasks: rating prediction in cold-start situations, top-N recommen- abstraction layer for accessing the data. We found it necessary to
dations from binary user feedback, and diversity in content-based detail even more the data abstraction layer, by the creation of a
recommendations. Though our example aims at the movie sce- data treatment layer. This was important because LOD are mainly
nario, we also addressed some of the challenges in our experi- raw data and when more than one base is used, it is essential to
ments, and our results converge to the ones that emerged from an work the data so that interlinking data across the bases is pos-
analysis of the different approaches proposed by the participants sible. So adaptions were made to Heitmann and Hayes’ proposal.
of the ESWC 2014 challenge: the best performing techniques, with Fig. 1 shows the architecture used, with the phases to be followed
respect to the provided dataset, for rating prediction and top-N and their relations.

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

4 J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14

The first phase of the architecture is data treatment, which has three subtasks:

(1) data access


(2) data interlinking
(3) treated data storage

The goal of data treatment is to leave the data ready to be consumed by the recommendation system. The domain and the datasets to
be used have to be chosen before this phase begins.
To explore the possibility of using several triplet datasets, it is necessary to build an access layer to the data. The goal of the access
layer is to make any available data retrievable as a triple by a query. So, after the data is extracted from the data cloud, it is treated and
stored locally. But it is not enough to store the data, this data also needs to be integrated. This is an important phase as each dataset has
its own ontology describing its terms, and not always the vocabulary adopted is the same. Besides, each base has an URI to identify an
element that rarely has the same URI in other base, even if semantically both bases would mean the same element. Because of that, a
mapping is made during the data treatment phase so that data in one base is associated to equivalent data in another base.
In order to create a semantic link between data items, it is necessary to specify the source and the destination bases, as well as the
possible link types (sameAs, differentFrom, equivalentProperty, inverseOf). The similarity measure between the items is then used to define
the link creation. There are different approaches to measure similarity between two data item, as for example the string distance of values
of properties. This is the one that was used. Also, the threshold to define when similarity is high can be chosen according to the specific
domain. A high level procedure of the data interlinking procedure follows:

The data interlinking phase leaves the data ready to be used by the recommendation algorithms. At this point the open recommenda-
tion system turns, with some adaptations, into a closed recommendation system. The treated data, the recommendation algorithms, and
the input URI become the three system’s basic components. The input URI corresponds to the resource based on which a recommendation
should be computed when compared to the data that was prepared and stored in previous phases. Providing these two inputs needed by
recommendation algorithms makes it possible to recommend to the user items from the integrated datasets. So next we explain how the
recommendations are made. This is done by the recommendation module of our architecture.
The recommendation module is divided in three parts: recommendation of resources from the same type of the input resource, recom-
mendation of people based on social networks and recommendation of resources based on textual description. According to the desired
recommendation type, the system switches among three different algorithms, as seen in the pseudo-code below.

The recommendation of resources from the same type of the input resource consists in providing to the user recommendations of re-
sources from the same type, and belonging to the same dataset, as the resource (URI) given as input. This is the simplest recommendation,
as only one information source is used. The user provides a URI of a resource from a specific type, belonging to a dataset of his or her
choice. This URI feeds the recommendation module, that compares the input with all other resources from the same type and from the
same dataset. This way, the output is a set of resources from the same dataset as the input, ordered by similarity. The guidelines to this
recommendation process can be seen in Procedure Rec-ResourcesSameType, given below.

The second recommendation proposed is the recommendation of the social network around a resource, in other words, recommenda-
tion based on the network of people around the input resource. Considering an input resource (i) from a dataset source (parameter ds)
chosen by the user, the mapping (previously prepared at the interlinking phase) finds the corresponding resource in other LOD datasets
(parameter dd). Our main intention in this approach is to be independent of domain, which means, it could be applied in several scenar-
ios. For this, we based our search on the property “type” (in our example, as we are looking for movies, we will search by all items which
have the property type defined as “movie”).

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14 5

The next step is to identify all people related to the corresponding resource. To find them, we search by resources having the type
“person”. For each person, we will search by other people related to him/her because they are considered collaborators, creating a (social)
network connected to the original resource. In our example, for each movie, we will search by resources linked to it, which have the
type property with value “person”. Resources of type “person” related to resources of type “movie” could represent actors, directors,
screenwriters, and others. After that, we will identify the social network related to that movie, which is people which worked or have
other relationships with each person related with this movie. People with the biggest number of collaborations with the people related to
the input resource compose the output of this recommendation. The threshold, that is, the degree of relationships is a variable defined by
the user. This step can be described as:

The third and last type of recommendation developed is recommendation of resources using textual descriptions. Here again more than
one dataset is used as data source for the recommendation, and the resource mapping among datasets is important. The user provides
an input resource from a specific dataset (URI), then corresponding resources to the input URI are found in other datasets and similar
resources are computed considering the analysis of texts found in the description properties of the resources. The resources’ description
texts that are more similar to the description text of the input resource form the output set of this recommendation. Here, resources of
any type can be recommended (not necessarily resources from the same type as the input resource will be recommended). The procedure
Rec-Description illustrates this activity.

To clarify the use of this approach, in next section we describe an example, detailing all the steps of this architecture.

4. Application example: movies domain

This section details the architecture components Data Access layer, Data Interlinking layer, and Recommendation Algorithms, contex-
tualized in the chosen domain: movies. We discuss in detail how each recommendation type is computed, in order to highlight relevant
implementation aspects and tools used.

4.1. Data access layer

As discussed before, this layer prepares the data for the next layers. The goal is to extract and load the data so that it can be used
by the Recommendation Algorithms. Two data sources were selected to generate recommendations: Linked Movie Database and DBPedia.
Both sources allow remote access through SPARQL Endpoints (Prud’hommeaux et al., 2007). Even though the developed algorithms can be
easily translated to SPARQL language, this approach has shown several limitations: it is highly dependent on the servers’ availability, and
the amount of data accessed during the execution of the recommendation algorithms requires a very large number of requisitions to the
endpoints, turning the process very slow. Some public endpoints as DBPedia limit the number of replies to each query, forcing the use of
nested subqueries to obtain the data needed (Passant & Decker, 2010).
Because of these restrictions, we followed the approach of some related works (Mirizzi et al., 2012; Passant, 2010) and opted for the
local storage of the data. Both selected datasets provide up to date dumps, so recommendations’ computation can be done locally. The
Linked Movie Database provides a unique dump that was completely used. DBPedia on the other hand provides segmented files, from
which the following were used:

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

6 J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14

– Ontology infobox types: This file contains triplets in the form


<rdf object:object type class>. This data allows to identify, for be compared using their labels (represented by the property rdf-
example, if the resource corresponds to a person, to a city, to a schema:label). In this step, we used the Jaro metric (Volz et al.,
movie, and so on. For the recommendation of persons, it is nec- 2009), provided by Silk because its usage very common. It is a
essary to query this group of triplets in order to filter resources measure of similarity between two strings.
of the desired type. It is important to emphasize that all the configuration process
– Ontology infobox properties: This is the biggest file, considered (represented in Figs. 4–6) is manually specified.
the center of the dataset. The resource properties used by the Silk execution (with the aforementioned configurations and set-
recommendation algorithms are in there. tings) results in a set of RDF triplets. The subjects are the source
– Short abstracts: This is an essential file for the computation of resources from LinkedMDB, the properties created are owl:SameAs
the text based recommendation. There lay the abstracts that and the objects correspond to destination resources from DBPedia.
are extracted from Wikipedia articles. The objects from these Fig. 7 shows the generated result for the movie “Billy Elliot”. Using
triplets are abstracts that describe the resources in texts up to this mapping, the recommendation algorithms that use more than
500 characters. one dataset can, from a given resource in LinkedMDB, consider re-
– Links to LinkedMDB: This file holds the equivalence links from sources from DBPedia.
DBPedia to the Linked Movie Database. It contains triplets in- After Data Interlinking, the generated triplets are stored locally
forming that a specific resource in DBPedia corresponds to (rep- together with the data extracted from the data sources. The re-
resents the same object as) another resource in Linked Movie sulting dataset is then ready to be accessed and consumed by the
Database, even if the URIs from both resources are not the recommendation algorithms.
same.
4.3. Recommendation of resources from the same type of the input
Being the data ready and locally stored, the next step is to resource: movies
construct a data interlinking mechanism so that the different data
sources can be cross referenced. In this case, when the input of the recommendation algorithm
has “movie” as its type property, the recommended output re-
4.2. Data interlinking sources will necessarily have this same type property. Besides, all
output URIs come from the same data source as the input resource.
Two from the three recommendation algorithms proposed in From the three recommendation types developed in this work, this
this work use data from more than one data source. If the chosen is the only one that can be found in the literature (Mirizzi et al.,
data sources do not have a common data schema, a tool to map re- 2012; Passant & Decker, 2010; Passant, 2010). As described in Pro-
sources from one source to the other is necessary. In other words, cedure Rec-ResourcesSameType in Section 2, we need to measure
we need a mechanism to find out if a resource in one dataset is the similarity between items. We opted for the use of the Linked
equivalent to a resource in another dataset. Data Semantic Distance (LDSD) (Passant, 2010), more specifically
In the movies domain, we should obtain the equivalence among the Balanced Combined Distance (LDSDcw). An output file is gener-
resources that represent the same movie in LinkedMovieDB and ated after the computation of the recommendation using LDSDcw.
DBPedia. LinkedMovieDB has some links to DBPedia, but in spite This file has the URI of the input resource, the URI of the movie
of that the number of relations owl:SameAs among movies in both being compared, the similarity value found and the recommenda-
datasets is still small in relation to the total number of movies. As tion type (F, from film).
shown in Fig. 2, the movie “Billy Elliot” is an example of those that
do not have a property owl:SameAs that points to DBpedia. The 4.4. Recommendation of the social network around a resource
same movie is represented by different URIs in each datasource,
as can be seen in Fig. 3, and there is no connection among them. The movie “Across the Universe”1 will be used to explain the
As described in Procedure Interlinking in Section 2, we need to steps of the algorithm used to recommend people.
measure the similarity between items and link them. We used Silk The idea behind this process is to recommend the social net-
(Volz, Bizer, Gaedke, & Kobilarov, 2009) to do that, and a similar- work in which the input resource is inserted. For our example, we
ity relation of the kind owl:SameAs was created, as represented by consider as input resource the movie “Across the Universe”. The
the dotted line in Fig. 3. first step is to discover the corresponding URI of the movie in DB-
A configuration file must be provided to Silk, describing the Pedia, using the mapping done with the Silk tool, previously ex-
type of link that should be created, the data sources that should be plained. For the next steps, the algorithm uses the URI from DB-
used, how the data will be accessed, and some other parameters. Pedia, http://dbpedia.org/resource/Across_the_Universe_(film). This
Fig. 4 shows the part of this configuration file which deals with the movie is related to different resources, by several properties. The
data access to the movie recommendation system. In this part, we next step is to find out which of these properties are linked to re-
have to specify the addresses of the access points where will be sources of the type “person”, as for example actors, writers and
connected and used. In this example, DBPedia’s and LinkedMDB’s movie directors, which we consider the “collaborators” regarding
access points are configured. the movie. After identifying all people related to the movie, for
The type of the new links (that will be created) must also each person found, the algorithm searches for the movies in which
be specified. For this movie recommender, a link of the type the person took part (navigating by linking properties of interest).
owl:SameAs should be created among equal items from different This process is then repeated for all people related to the movie
datasets. The source resources are from LinkedMDB and are re- “Across the Universe”, and the movies found are gathered in a list.
stricted to the type http://data.linkedmdb.org/resource/movie/film. With the generated list of movies, a new list is generated with
These resources will be linked to destination resources from DBPe- all people that worked or are somehow related to these movies (so,
dia limited to the type http://dbpedia.org/ontology/Film. This con- the collaborators of these movies). The persons that appear more
figuration is illustrated in Fig. 5. times in the list are the ones to be recommended. The output of
To define how the source and destination resources should
be compared, another file configuration is needed. Fig. 6 illus- 1
corresponding URI in LinkedMDB is http://data.linkedmdb.org/resource/film/
trates this process, where the URIs of the different datasets will 6345

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14 7

Fig. 2. Properties of the movie Billy Elliot in LinkedMDB.

the algorithm is a ranked list of persons. We say that the list is


ranked by the most number of collaborations. The social network
recommendation delivers an output text file containing the URI of
the input resource, the corresponding URI for each recommended
person, the rank number computed by the algorithm, and the rec-
ommendation type (P, from people).

4.5. Recommendation from resources using textual description

When instantiating this recommendation type to the movie do-


Fig. 3. Representation schema of the movie Billy Elliot in LinkedMDB and DBPedia.
main, the descriptions or synopsis of the movie resources were
used as the texts. DBPedia offers two types of text descriptions:
short ones, limited to 500 characters, and complete ones, of free
size. Small texts were chosen because of text indexing limitations.

Fig. 4. Data source configuration in Silk.

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

8 J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14

5. Experiments and evaluation

This section details the experiment that we conducted to eval-


uate our approach. Specifically, it is important to find out its feasi-
bility and its efficacy from the user’s point-of-view.
At first, a quasi- experiment was conducted to let a group of
users validate the application. At second, users’ feedback obtained
in the quasi-experiment was used to analyze: accuracy, prediction
accuracy by ranking, novelty and utility.
For the quasi experiment 50 famous movies were selected and,
for each of them, the three recommendation types were used. This
experiment generated 150 text files, 3 for each movie, with the
recommendations of other movies, people and resources. In order
that users could browse the data, receive and evaluate recommen-
Fig. 5. Configuration of the type of link, source and target in Silk.
dations, a graphical interface was also implemented.
The experiment was done from the perspective of a recom-
mender systems’ user, so the participants should evaluate if the
provided recommendations were useful, in relation to the movie
provided as input. The desired user profile for a participant was
someone familiarized with the use of recommender systems and
that had an interest for movies. The experiment participants were
35 Computer Science undergraduate students from UFRJ3 from the
3rd year or further. Participants received a two hour talk about the
motivation of the application and the general idea of the work, be-
sides instructions on how to use the system. They were asked to
Fig. 6. Configuration of the type of comparison in Silk. choose one to five movies that they had already seen and liked,
and then to evaluate the recommendations generated when giv-
ing each of these movies as input. Participants’ impressions on the
recommendations were collect by asking them to answer a ques-
tionnaire, where for each recommendation they had to choose one
among each of the following options:

Fig. 7. Silk result for the movie “Billy Eliot”. – The user knows the recommended item and finds it interesting;
– The user knows the recommended item and finds it uninterest-
ing;
The main goal of this recommendation type is to recommend – The user does not know the recommended item, but is inter-
resources of any type to the user, based on how similar their text ested in knowing it; or
description is in relation to the text description of the input re- – The user does not know the recommended item and is not in-
source. The proposed method gets as input a resource from Linked- terested in knowing about it.
MovieDB. Using the mapping built by the integration layer, the cor-
A binary scale was adopted: either the user is interested or not.
responding resource in DBPedia is found and the algorithm uses
In total, 97 questionnaires were answered. All participants
the URI from DBPedia for the next steps. It is important to re-
started at the same lab and at the same time, the last participants
member that short descriptions from DBPedia resources (among
left the lab in 2 h time. The accuracy of the recommendations was
other data) were locally stored during the Data Integration phase.
used to estimate how close the recommendations generated were
Through the analysis of these short descriptions, the recommenda-
to the real preferences of the users. This criterion was also used
tion algorithm delivers 10 resources considered similar to the input
to compare the three recommendation types among them, and to
resource.
investigate which recommendation type most surprised the user.
As seen in Procedure Rec-Description in Section 2, this recom-
The null hypothesis of this experiment is: “The system is not
mendation makes use of text mining techniques to do the recog-
able to provide good quality recommendations”. This means that
nition of most relevant terms from description texts. For this, the
the system cannot recommend items that might interest the user.
Lucene library2 was used to compute the similarity search among
The alternative hypothesis is: “The system is able to provide
the texts. The first step is the creation of an inverted index (Frakes
good quality recommendations”. A “good quality recommendation”
and Baeza-Yates, 1992) to locate terms from the description texts
brings correct results that will give some benefit for the user (new
in the local copy of DBPedia data. Lucene’s StandardAnalyzer li-
acquisitions, knowledge, entertainment, etc). Some of the proper-
brary was used to process the short descriptions and define key
ties are traded-off, the most obvious example perhaps is the de-
words that appear in the index. Among other tasks, this library re-
cline in accuracy when other properties (e.g. diversity) are im-
moves stop words and converts all letters to capitals.
proved. Based on RICCI et al. (2011), we consider as the main prop-
After locating the short description of the input resource, it is
erties of a “good quality recommendation”:
possible to query the index in order to obtain the resources that
Accuracy average – it is one of the most important dimen-
have associated texts’ most similar to the one of the input re-
sions to be analyzed. It was obtained for each of the three types of
source. The output of the algorithm is a text file containing: the
recommendation. The computation of this metric can be done by
URI of the input resource, the URIs of the recommended resources,
dividing the number of correct recommendations (items that the
the similarity score and the recommendation type (A, for all).
user was interested in) by the total of recommendations provided.

2 3
Apache Lucene, http://lucene.apache.org/ Universidade Federal do Rio de Janeiro

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14 9

Here, we compare our results with Passant (2011), which is a state teresting. Another work that investigated the recommendation
of the art. In case the results show that accuracy of the proposed of interesting items (items that the user is interested in) was
algorithms is higher or equal to this reference, our alternative hy- Duan et al. (2012). In Duan et al. (2012), the number of un-
pothesis will be accepted. known items in the top-10 list ranged (roughly) from 20% to
70% considering different algorithms, whereas the percentage
• Prediction accuracy by ranking – It may predict user opinions of interesting items (among known and unknown items) ranged
over items. In our case, we are interested not only in whether from 20% to 40%. The evaluation results reported in this work
the system properly predicts the ratings of these items, but show the percentage of the recommended items which were
rather whether the system properly predicts that the user will unknown to the user, and also the percentage of the recom-
add these items to the queue (for a future use). We calculate mended items which were interesting to the user, but not the
it by correlations between algorithm ranking and user prefer- percentage of items which are both unknown and interesting
ence ordering. If all the correlations have positive values and at the same time. None of the related works cited in Section
all of them higher than 0.70, our alternative hypothesis will be 2 showed evaluation experiments to assess the utility of the
accepted. In statistics, the correlation coefficient (r) measures recommendations in a way that could be compared to our re-
the strength and direction of a linear relationship between two sults. Considering the results reported in the aforementioned
variables on a scatterplot. The value of r is always between +1 studies and that the order of items that match the user’s pref-
and –1. In Rumsey (2016) a correlation of +0.70 is considered erence is extremely subjective and varies from person to per-
a strong uphill (positive) linear relationship. That is the reason son, we believe that the utility metric can be improved with a
that we chose this value. personalized recommendation. Knowing that this work did not
• Novelty – Some recommender systems produce recommenda- include any data from the user’s profile in the calculation of the
tions that are highly accurate and have reasonable coverage— recommendations, we considered a successful score when 50%
and yet that are useless for practical purposes (Herlocker, of new recommended items are useful. In this case (>= 0.50)
2004). We need new dimensions for analyzing recommender our alternative hypothesis will be accepted.
systems that consider the “nonobviousness” of the recommen-
dation. One such dimension is novelty (Herlocker, 2004). Novel In this evaluation, we will consider that the statement “the sys-
recommendations are recommendations for items that the user tem is able to provide good quality recommendations’ is true when
did not know about (RICCI et al., 2011), i.e. the discovery of new all the properties were validated.
items. Designing metrics to measure novelty is difficult, because
novelty is a measure of the degree to which the recommenda- 5.1. Accuracy average
tions are presenting items that are both attractive to users and
surprising to them. In fact, the usual methods for measuring In our desire to reject the null hypothesis and prove the alter-
quality are directly antithetical to novelty. We considered a suc- native hypotheses, the accuracy of the three types of recommen-
cessful score when 1/3 of recommended items are new (higher dation were computed. Considering that the algorithm for recom-
than 33%) and in this case, our alternative hypothesis will be mending resources of the same type as the input resource is the
accepted. To analyze this metric, we will use as a reference the same used to recommend songs in Passant (2011), accuracy from
studies of Zhang (2013) and Vargas (2011). Zhang (2013) com- the other two types of recommendation will be compared to the
pared different novelty metrics in two larger datasets of movies values previously obtained in the music domain. In case the re-
(MoviesLen and Netflix), using different recommender meth- sults show that accuracy of the proposed algorithms are higher or
ods: random, topN popularity recommendation, probability of equal to this reference, our alternative hypotheses is accepted.
items being liked (PL), probability of items being liked and dis- For each movie selected by the user, each recommendation type
similarity (LD) and probability of items being liked and dissim- provided in general 10 recommended items. In total, the recom-
ilarity and satisfaction (LDS). In this solid study, the novelty mendations generated by 970 input movies were evaluated. This
score were extremely small in all the methods (<= 0.0431). The should have generated 970 recommended items for each recom-
second article (Vargas 2011) presented a formal framework for mendation type, but as can be seen in Table 1 the line “Total rec-
the definition of novelty and diversity metrics that unifies and ommendations” is lower than that. This happens because in spite
generalizes several state of the art metrics. Based on the com- of all the care that was used to select and prepare the datasets,
bination of ground elements, the authors defined a set of meth- some links were broken and in some rare cases the algorithm
ods to measure novelty and diversity. In their approach, the could not find 10 related items to recommend.
metrics related to novelty (EPC and EPD) of a relevance-aware In order to compute the accuracy average for each recommen-
variant of the content-based algorithm (remembering that our dation type, we first computed the accuracy per user, type, and
approach is also a content-based one) were lower than 33%. movie. This means each recommendation feedback of each type,
Consequently, we used this score to be our threshold. for each user and each input was individually considered. For ex-
• Utility – Based on the participants’ feedback about the rec- ample, assume that a user has seen and liked movie “ET The Extra-
ommendations generated by each of the three algorithms. In Terrestrial’ and that this user is going to evaluate recommendations
this case, we will analyze the rate of the answers “I do not given for this input movie. The feedback from the user for the first
know the item, but I am interested in knowing it” related to type of recommendation regarding this input can be seen in Table
all unknown items by the user. We consider that the most use- 2.
ful type of recommendation is the one that shows the user an The computation of this metric can be done by dividing the
item that he is interested in and did not know about it. In a number of correct recommendations by the total of recommen-
study presented in McNee et al. (2002) several recommenda- dations provided, so it is necessary to define what we consider a
tion algorithms were evaluated regarding user perceived util- “correct” recommendation. We assumed that a recommendation is
ity of the recommended items. The percentage of “useful’ rec- correct when the user classifies it as “I have seen and liked it” or “I
ommended items reported in this study ranged from approx- have not seen but would be interested”. For the example in Table
imately 25% to 60%, depending on the recommendation algo- 2 the total recommendation given for the input “ET The Extra Ter-
rithm used, but both known and unknown recommended items restrial” is ten. From this set, seven are considered correct, as seven
were classified as useful in case the user perceived them as in- are the number of “X” in the two categories that denote user in-

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

10 J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14

Table 1
Quasi-experiment participants’ feedback regarding the recommendations generated by each of the three algorithms.

Classification / Type Movie Recommendation People Recommendation Resource recommendation

I know the item and find it interesting 398 352 443


I know the item and find it uninteresting 80 44 59
I do not know the item, but I am interested in knowing it 224 321 264
I do not know the item and I have no interest in knowing about it 199 232 173
Total recommendations 960 949 939

Table 2
User feedback for movie recommendations given “ET The Extra-Terrestrial” as input.

Movie / Classification I have seen and liked it I have seen and I have not seen, but I have not seen and
did not like it would be interested I am not interested

Close Encounters of the Third Kind X


Twilight Zone: The Movie X
Jurassic Park X
Schindler’s List x
Hook x
The Lost World: Jurassic Park x
A.I x
Always x
Empire of the Sun x
War of the Worlds x

terest. This way, accuracy for the recommendations in the example Table 3
Number of positive, negative and non-existing correlations between algo-
is 0.7, what means that 7 out of 10 suggestions interested the user.
rithm ranking and user preference.
The same computation was done for the 97 movie inputs,
and the corresponding feedback was obtained for each user, and Correlation / Recommendation type Movies People Resources
grouped by the type of recommendation. After the individual cal- r > 0 (positive) 81 75 86
culation, accuracy average was obtained for each of the three types r < 0 (negative) 16 22 11
of recommendation. r = 0 (none) 0 0 0

5.3. Novelty
– Recommendation of resources of the same type (Movies): 0.69
– Recommendation of people: 0.70
Another metric that can be extracted from the feedback is the
– Recommendation of resources of any type (based on the short
discovery of new items. This is relevant as one of the goals of our
description of the resources): 0.75
approach was to provide novel, serendipitous items to the user. We
propose a simple metric to evaluate this: the discovery index, to be
As Seelv is a mature application and represents a state of the computed dividing the number of recommended items unknown
art, we decided to use this work to compare with ours. In this case, to the user, by the total number of provided recommendations.
we re-implemented its algorithm and used in the same dataset of This is done individually for each recommendation type. For this
our experiment. The accuracy of Seelv and our recommendation of metric, an item in considered novel when the feedback received
resources of the same type (Movies) were the same: 0,69, but the for it was “I have not seen but would be interested” and “I have
other algorithms of our approaches were better: Recommendation not seen and am not interested”. In the social network recommen-
of people: 0.70, Recommendation of resources of any type (based dation, the categories for feedback were “I do not know but I am
on the short description of the resources): 0.75 . Consequently, we interested in his or her work” and “I do not know and am not in-
considered our results (based on accuracy) positive. terested in his or her work” . In the recommendation of resources
from any type, classification options were: “I do not know the item,
but would be interested” and “I do not know the item and have no
interest”.
5.2. Prediction accuracy by ranking The recommendation of people was the one with the higher
discovery index, bringing novel items to the user in 58% of the rec-
Another investigation was to check if there is a relation among ommendations. The other two recommendation types had similar
the items’ rank generated by the algorithm and the real users’ pref- results: 47% of novel items. It is important to mention that this
erence among the items. In the feedback questionnaire participants index does not take into account if the user liked the item.
were asked to reorder the recommended items according to their
preferences. This data was used to compute the Spearman coef- 5.4. Utility
ficient, in order to measure the strength and direction (positive
or negative) of the correlation among the two variables provided: In this case, we had 53% (movies), 58% (people) and 60% (re-
the recommendation rank or the user’s preference ordering. Spear- sources) as new and useful items. Based on the results obtained
man coefficient (rs) values are in the interval [−1, 1], where rs > 0 for each property, we could consider that our approach is able to
means positive relation among rank and preferences, rs < 0 means provide good quality recommendations.
negative relation, and rs = 0 means no relation among them. Table
3 shows the number of evaluations for each category. 6. System interface
The numbers in Table 3 show that mainly the algorithm ranking
and the real user preferences relation go together (are positively A simple Web interface was developed to allow end users to
related). Consequently, our alternative hypothesis is accepted. make use of our recommendation system. The interface language

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14 11

Fig. 8. Selection of the input resource.

is Portuguese-Brazilian because this project received support from tion that should catch his or her interest were proposed and eval-
the Brazilian government. uated. Also, our work aimed to situate the user in the growing LOD
During the experiment, all the users evaluated our approach us- cloud.
ing this interface. Fifty films were pre-processed, and its recom- The movie domain was chosen to instantiate an example appli-
mendations already calculated and stored in an ad-hoc and offline cation, given the difficulties to build a general and domain-free ap-
process. The pre-processing was done because the algorithms are plication. Two datasets were used: LinkedMovieDatabase and DB-
complex and could be uncomfortable to the user to wait for the Pedia. Three recommendation types were implemented: one that
processing time. recommends items of the same type as the input item, a sec-
The first screen of the interface (Fig. 8) corresponds to the in- ond one that recommends people given an item, and a third one
put selection feature. Through it, the user selects the film that will that recommends items of any type, given an input item. Building
be the seed of the recommendation process. We opted for the use an application that was restricted to the movie domain is justi-
of the name of the film obtained by the label property, rather than fied by current technical limitations, such as low responsiveness of
the URI for easy recognition of each film, making the use of triplet the SPARQL endpoints, poor quality of public datasets, no existing
data friendlier. In Fig. 8 the film Spider-Man was selected. Still mapping among the ontologies behind the datasets.
on the first screen, two disabled checkbox are presented with the The proposed approach brings a new light to LOD consumption
name of the datasets that are used in the recommendation. This is in relevant aspects. It uses more than one dataset (two were used
done only to inform what data sources are being used. in our application, but more could be used), it is able to recom-
After selecting the movie, the system displays the second mend items of different types, and it is able to recommend items
screen, were the three types of recommendations are seen. On the that have few properties in one dataset, as it can use other datasets
recommendation of films (Fig. 9), ten films with the highest degree and a mapping among them to search for further properties of an
of similarity are suggested to the user. To facilitate the visualiza- item. Together, these features allow the user to reach the real ben-
tion, a bar graph is displayed with the names of the recommended efits from LOD. A first-necessary step is the data treatment. The
films. If the user wants to view the properties of the recommended treatment stage can be applied to any dataset. For this, the dataset
films, the list above the graph has links to the source datasets. should be available on the Internet; then imported and stored on
On the recommendation of people (Fig. 10), the interface dis- a local base; with available metadata. Respecting these conditions,
plays a list with the ten persons with strongest connections to the it is possible to recommend data using any datasets. Consequently,
input resource. A graph is also displayed, representing the connec- if we use a good number of descriptors, better the recommenda-
tions between the people resources and the film. In the list of rec- tions will be. We can deal with several datasets, but the bigger the
ommended persons, a prominent number in bold and in paren- number of datasets, the bigger is the complexity. During the Data
theses is the number of collaborations (links) between the input Interlinking Process, our approach analyzes two datasets per inter-
resource and that person. This number can also be found as the action. Consequently, we can have a performance problem in this
weight shown in the graph edges. stage, since all possible pairs of datasets will be considered and
The last type of recommendation (Fig. 11) shows a list of re- processed. This can be considered a limitation of this work and
sources without type restriction. Therefore, although the input re- performance tests using more than 2 datasets will be made in the
source is a movie, characters are suggested, people and comic future.
books. In addition to the list, the interface displays, highlighted, The presented evaluation results are per se a contribution, as
the summary of the film that served as the input to the recom- most works found in the literature say very few about what to ex-
mendations. pect from recommendation in LOD. Here, we would like to high-
light a limitation of this work. A sample of 35 people could not
7. Conclusions and future work be expressive. In our case, we had two options: 1) use a bigger
sample, which could have people with different profiles and con-
This work tackled the difficulties in consuming LOD. Recom- sequently be a heterogeneous group or 2) use a small sample, with
mendation techniques to present the user serendipitous informa- a specific profile. We decided by the second option because we be-

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

12 J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14

Fig. 9. Recommendation of movies.

Fig. 10. Recommendation of people.

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14 13

Fig. 11. Recommendation of resources.

lieve that a balanced group formed by skilled people could evalu- Di Noia, T., Cantador, I., & Ostuni, V. C. (2014). Linked open data-enabled recommender
ate better this approach than a bigger and heterogeneous group. systems: ESWC 2014 challenge on book recommendation (pp. 129–143). Springer
International Publishing. doi:10.1007/978- 3- 319- 12024- 9_17.
For a better analysis, we needed people with a high expertise in Duan, J., Prasad, S., Huang, J., Tan, P., Chawla, S., Ho, C. K., et al. (2012). Discovering
web systems, experience in the recommender systems, highly mo- unknown but interesting items on personal social network. Advances in knowl-
tivated with the scenario (to complete all the tasks and compare edge discovery and data mining: 16th Pacific-Asia conference, PAKD, Kuala Lumpur,
Malaysia, May 29 – June 1, 2012, proceedings, part II. Springer Berlin Heidelberg.
the results carefully) and with enough knowledge in LOD (all of doi:10.1007/978- 3- 642- 30220- 6_13: 145- 156, 2012.
them received training in LOD). It is hard to get a big group with Frakes, W. B., & Baeza-Yates, R. (Eds.). (1992). Information retrieval: Data structures
all these characteristics. We believe that this restriction does not and algorithms. Upper Saddle River, USA: Prentice-Hall, Inc..
Franz, T., Koch, J., Dividino, R. Q., & Staab, S. (2010). Lena-tr: Browsing linked open
affect the quality of the work, because most works found in the
data along knowledge-aspects. In AAAI spring symposium: Linked data meets ar-
literature say very few about what to expect from recommenda- tificial intelligence (pp. 1–10).
tion in LOD and very few evaluate their solutions with a group of Glaser, H., & Millard, I. (2007). Rkb explorer: Application and infrastructure. In Se-
mantic Web Challenge (pp. 111–118).
users.
Glaser, H., Millard, I., & Jaffri, A. (2008). RKBExplorer.com: A knowledge driven in-
This approach can be extended to other domains. In the current frastructure for linked data providers. In Proceedings of the 5th European seman-
stage of the work, we cannot confirm if this solution will present tic web conference on the semantic web: Research and applications (pp. 797–801).
good results for other domains. As future work, more experiments Berlin, Heidelberg: Springer-Verlag.
Heath, T. (2008). How will we interact with the web of data? IEEE Internet Comput-
with datasets from different domains should be conducted. ing, 12(5), 88–91.
This work was thought as a way to foster the usage of LOD. Heitmann, B., & Hayes, C. (2010). Using linked data to build open, collaborative rec-
Our long term goal is that recommendation can be incorporated ommender systems. In AAAI spring symposium 2010: Linked data meets artificial
intelligence (pp. 76–81).
in RDF browsers, in order to exhibit for a searched resource not Herlocker, JonathanL. (2004). Evaluating collaborative filtering recommender sys-
only resource properties but also suggestions of other resources tems. ACM Transactions on Information Systems (TOIS), 22, 5–53.
that might interest the user, contextualizing the searched resource. Klyne, G., & Carroll, J. J. (2004). Resource description framework (RDF): Con-
cepts and abstract syntax, World WideWeb consortium, recommendation REC-rdf-
The reverse path sounds also promising: using LOD to boost rec- concepts-20040210 http://www.w3.org/TR/2004/REC- rdf- concepts- 20040210.
ommendations of existing recommenders. A next step in this di- McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., et al.
rection is to take the user profile in to account to make recom- (2002). On the recommending of citations for research papers. In Proceedings
of the 2002 ACM conference on computer supported cooperative work (CSCW
mendations, personalizing the output.
’02) (pp. 116–125). doi:10.1145/587078.587096.
Mirizzi, R., Noia, T. D., Ragone, A., Ostuni, V. C., & Sciascio, E. D. (2012). Movie rec-
References ommendation with dbpedia. In IIR (pp. 101–112).
Musto, C., Lops, P., Basile, P., de Gemmis, M., & Semeraro, G. (2016). Semantics-aware
Berners-Lee, T., Cailliau, R., Groff, J.-F., & Pollermann, B. (1992). World-wide web: graph-based recommender systems exploiting linked open data. In Proceedings
The information universe. Electronic Networking: Research, Applications and Pol- of the 2016 conference on user modeling adaptation and personalization (UMAP
icy, 1(2), 74–82. ’16) (pp. 229–237). doi:10.1145/2930238.2930249.
Berners-Lee, T., Fielding, R., & Masinter, L. (1998). Uniform resource identifiers (uri): Ostuni, V., Di Noia, T., Di Sciascio, E., & Mirizzi, R. (2013). Top-N recommendations
Generic syntax. from implicit feedback leveraging linked open data. In Proceedings of the 7th
Bizer, C., Cyganiak, R., & Heath, T. (2007) How to publish linked data on the web. ACM conference on Recommender systems (RecSys ’13) (pp. 85–92). doi:10.1145/
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/. 2507157.2507172.
Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data - the story so far. Interna- Passant, A. (2010). Measuring semantic distance on linking data and using it for re-
tional Journal on Semantic Web and Information Systems, 5(3), 1–22. sources recommendations. In AAAI spring symposium: Linked data meets artificial
Burke, R. D. (2002). Hybrid recommender systems: Survey and experiments. User intelligence (pp. 93–98).
Modelling and User-Adapted Interaction, 12(4), 331–370.

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037
JID: ESWA
ARTICLE IN PRESS [m5G;November 2, 2016;11:3]

14 J. Oliveira et al. / Expert Systems With Applications 000 (2016) 1–14

Passant, A. (2011). Seevl - mining music connections to bring context and discovery Ricci, F. (2011). Recommender system handbook (p. 842). New York, NY: Springer-Ver-
to the music you like. Semantic web challenge 2011 winner, ISWC. lag. ISBN-13:978-0387858197.
Passant, A., & Decker, S. (2010). Hey! ho! let’s go! explanatory music recommenda- Rumsey, D. J. (2016). How to interpret a correlation coefficient r. Statistics For Dum-
tions with dbrec. Extended semantic web conference. Berlin Heidelberg: Springer. mies ISBN: 978-1-119-29352-1.
Piao, G., & Breslin, J. (2016). Measuring semantic distance for linked open data- Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic in-
enabled recommender systems. In Proceedings of the 31st annual ACM symposium dexing. Communications of the ACM, 18(11), 613–620.
on applied computing (SAC ’16) (pp. 315–320). doi:10.1145/2851613.2851839. Vargas, S., & Castells, P. (2011). Rank and relevance in novelty and diversity metrics
Prud’hommeaux, E., Seaborne, A., Raimond, Y., Abdallah, S., Sandler, M., & Gias- for recommender systems. In Proceedings of the fifth ACM conference on recom-
son, F. (2007). The music ontology. In ISMIR 2007: 8th international conference mender systems.
on music information retrieval (pp. 417–422). Volz, J., Bizer, C., Gaedke, M., & Kobilarov, G. (2009). Silk - A link discovery frame-
Raimond, Y., Abdallah, S., Sandler, M., & Giasson, F. (2007). The Music Ontology. work for the web of data. In InLDOW (pp. 1–8).
In Proceedings of the 8th International Conference on Music Information Retrieval Zhang, L. (2013). The definition of novelty in recommendation system. Journal of
(pp. 417–422). Vienna, Austria. Engineering Science and Technology Review, 6., 141–145.

Please cite this article as: J. Oliveira et al., A recommendation approach for consuming linked open data, Expert Systems With Applica-
tions (2016), http://dx.doi.org/10.1016/j.eswa.2016.10.037

You might also like