You are on page 1of 8

Simplifying mashup component selection with a combined

similarity- and social-based technique
Boris Tapia
Universidad Técnica Federico
Santa María
Departamento de Informática
Valparaíso, Chile
btapia@alumnos.inf.utfsm.cl
Romina Torres
Universidad Técnica Federico
Santa María
Departamento de Informática
Valparaíso, Chile
romina@inf.utfsm.cl
Hernán Astudillo
Universidad Técnica Federico
Santa María
Departamento de Informática
Valparaíso, Chile
hernan@inf.utfsm.cl
ABSTRACT
Web mashups are becoming the main approach to build
Web applications. Current approaches to enable compo-
nent selection include description-based techniques and so-
cially generated metadata. The explosive growth of APIs
makes increasingly harder selecting appropriate components
for each mashup. Unfortunately, description-based tech-
niques rely heavily on the quality of authors’ information,
and social-based approaches suffer problems like “cold-start”
and “preferential attachment”. This article proposes (1)
two new measures of socially ranked fitness of candidate
components, (2) an API functional taxonomy using For-
mal Concept Analysis based on descriptions, and (3) a com-
bined approach that improves description-based techniques
with these social ranking measures. We use social rank-
ings based on past (co-)utilization of APIs: WAR (Web API
Rank) measures API utilization over time, and CAR (Co-
utilization API Rank) measures its co-utilization with other
APIs. The measures and the combined approach are illus-
trated with a case study using the well-known Web APIs
catalog ProgrammableWeb
1
. A prototype tool allows it-
erative discovery of APIs and assists the mashup creation
process.
Keywords
mashup, formal concept analysis, recommendation system,
social network
1. INTRODUCTION
Mashups are becoming the de facto approach to build
customer-oriented Web applications, by combining opera-
tions on several Web APIs (Application Programming In-
terfaces) into a single lightweight, rich, customized Web
front-end. They allow to construct complete applications by
searching, composing and executing functionality provided
1
http://www.programmableweb.com
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Mashups 2011, September 2011; Lugano, Switzerland
Copyright 2011 ACM 978-1-4503-0823-6/11/09 ...$10.00.
by external sources. Today, the way that people build soft-
ware has radically changed, from implementing from scratch
to searching for already packed functionality that alone or
combined, fully or partially satisfies each requirement.
API Web catalogs provide, besides documentation, social
information about the real use of APIs on the registered
mashups. In our previous work [11], we have argued the
need to combine these two sources of information, where
description-based techniques can be leveraged by social in-
formation. The results of this combination, allows to dis-
cover candidates that would have passed unnoticed because
of their poor quality descriptions or their low popularity. It
is important to mention that we cannot use only the social
information, because as has been showed in previous work
[3] this leads to the cold start problem for new APIs, and
makes the discovery process to exhibit a preferential attach-
ment trend.
Typically, mashups are built with more than one API.
These APIs are iteratively selected and previous selections
influence current ones. Then, when composers are discover-
ing APIs, they must consider which other APIs they have
already selected. Developers community can help composers
to discover the most appropriate APIs based on their past
co-utilization. This information is obtained from the regis-
tered mashups and we can exploit it to support the discovery
process.
Currently, most techniques have tackled the discovery pro-
cess using separately social and semantic networks. In this
work, we present an approach to use both networks to enrich
and improve the discovery process results.
Our main contributions presented in this paper are
• a novel iterative approach to discover APIs to build
mashups driven by social utilization and co-utilization
over semantic techniques,
• a new model to represent APIs and their collaboration
to create mashups,
• a web tool that support this approach and
• the evaluation of the proposed approach through a case
study.
The paper is organized as follows. Section 2 introduces the
formal framework needed for representing both networks.
Section 3 describes our approach. Section 4 exemplifies our
approach with a case study. Section 5 briefly describes re-
lated work to highlight the challenges of this research. Sec-
Figure 1: API-Mashup and semantic networks
tion 6 discusses current issues with this approach and possi-
ble solutions, and finally, Section 7 gives concluding remarks.
2. SOCIAL AND SEMANTIC NETWORKS
Mashups are interesting because they express how APIs
can be combined to generate new and innovative systems.
APIs can be represented and distinguished between them by
the functionality they provide. And these functionalities can
be represented by the semantic meaning of the concepts that
describe them. In Figure 1, the API-Mashup network shows
the relationship between APIs and mashups, as well as the
relationship between the APIs and their most representative
terms.
Due to the increasing proliferation of APIs, we can assume
that for each API there is a set of functional-equivalent APIs
that can be interchangeable with them. Lately, the mashup
field has had many advances and contributors [10] and we
can conclude that we are facing a complete and specialized
community of agents (APIs) that provide different function-
alities represented by meaning terms.
As humans, newcomers learn by example, and as program-
mers, mashup composers also learn from other composers by
imitating previous decisions about which APIs to use (see
[6] and [12]). In other words, when composers are starting to
build a mashup, they tend to search for similar mashups and
their structures as the kickoff of their own projects. There-
fore, while more mashups had selected an API in the past,
the more likely it will be selected as part of a new mashup.
This social influence makes the current API discovery be-
have as a preferential attachment process, which brings big
issues as the cold start problem for new competitors.
Both semantic and social approaches are valid and allow
to discover APIs. Each one with its own issues. Seman-
tic networks rely only on “promises”, they do not consider
the power of social information discovering APIs, and Social
networks could hide better candidates because its nature is
to encourage what is already at the top. In the next subsec-
tions we will formalize separately each network, and in the
proposal section we will explain how they are combined and
empower each other.
2.1 The semantic network
Because each API is described according to the function-
alities provided, it is possible to identify clusters of function-
alities shared by several APIs. Therefore, we model the set
of APIs as multiple subsets or sub-communities that can be
hierarchically arranged according to the functionalities they
share and the relation (general/specific) between them. Our
objective is to obtain a full functionality taxonomy repre-
Table 1: Formal context example
Agent vs Terms T1 T2 T3 T4
A1 x x x x
A2 x x
A3 x x
A4 x
sented as a graph, where each node is a sub-community of
functionalities with several APIs anchored to it. Function-
alities are extracted from the natural language descriptions
of APIs. Inspired on [9] we consider a binary relation R be-
tween an API set A and a concept set C. R expresses the
fact that an API a provides the functionality represented by
c in some mashup. Then an API community (AC) is the
largest set of APIs A (extent) sharing a given set of func-
tionalities C (intent).
In our previous work, we proposed Growing Hierarchical
Self-organizing Maps (GHSOM) to model this structure [11].
GHSOM can assign an API to only one node of the hierar-
chy. However, assuming that an API belongs to only one
sub-community does not seem appropriate. Then, we pro-
pose to organize functionalities in a lattice-based structure,
which allows communities overlap and therefore APIs can
be classified simultaneously on different sub-communities.
In order to represent ACs hierarchically in a lattice-based
taxonomy we need to provide a partial order between ACs.
Formally, an AC has an extent E (the set of APIs that con-
forms the community) and an intent I (the set of function-
alities that the community shares). A community AC
i
is a
sub-community of AC
j
iff the intention I
i
is a subset of I
j
.
Then, the lattice is exactly the ordered set of ACs built from
A, C and R.
Consider the following example. We have a set of APIs A1:
TwitterVision, A2: Twitter, A3: Rumble and A4: Google
Maps, which expose different functionalities represented by
the terms T1: deadpool, T2: microblogging, T3: social and
T4: mapping. Table 1 shows which functionalities provide
each API. We call this table, the context that is represented
by the set of relations R between agents and terms, where
R
i,j
is blank if the functionality j is not provided by the
agent i.
Seeing lattice concepts as communities that provide func-
tionalities, the intent of each concept is the set of terms,
and the extent the set of APIs. Then, the APIs taxonomy
is built from the set of couples (A, C) as shown in Figure
2. The hierarchy is drawn according to the partial order <,
i.e. the bottom concept < top concept. We can appreciate
the lattice in a visual representation, where we can see at
the top concept, the whole dataset. Because we are consid-
ering APIs of any kind of functionality, we find the top con-
cept empty. On the medium-level, we find formal concepts
as the set {TwitterVision, Twitter, Rumble}, that provides
“social” functionality, or the set {TwitterVision, Rumble,
GoogleMaps}, that provides “mapping” functionality. If we
browse deeper we can see concepts as the set {TwitterVi-
sion, Twitter} that provides two functionalities at the same
time: “microblogging” and “social”, or the set {TwitterVi-
sion, Rumble}, that provides “social” and “mapping” func-
tionalities. At the bottom level, and for this particular small
dataset, we find one Web API that provides all the function-
Figure 2: API taxonomy example showing the ex-
tent of concepts
alities at the same time: “social”, “mapping”, “deadpool”and
“microblogging”, and actually is the only one that provides
“deadpool” functionality.
When a composer is searching for an API, he is actu-
ally searching for a set of functionalities. To represent this
search, he uses a set of keywords. This query is transformed
into a virtual concept which intent is the set of keywords
[1]. Then, we navigate the lattice in order to find the con-
cept that has exactly the same intent in order to retrieve its
extent. If there is no concept that matches the intent, the
virtual concept must be arranged within the lattice. From
the potential set of parents of this virtual node (nodes which
intent is a subset of the virtual concept’s intent) we select
the concept(s) whose intent is maximum, and then we sug-
gest the extent of that concept(s). Similar to [1], we walk the
semantic graph from the virtual concept to their ancestors.
The distance from a child node and their direct parents is
one. Then the nodes with minimal distance to the virtual
node are the best candidates to be recommended. In this
work, we are only considering nodes with distance of one,
but it is possible to enrich the searching process by walking
the graph at longer distances.
2.2 The social network
We can see APIs as agents interacting between them to
collaboratively create a new application. We represent these
interactions as a social network conformed by the set of APIs
A and the set of undirected links E between them. Figure
3 depicts the collaboration network of APIs. The thickness
of the edges indicate the frequency of these collaborations,
and the size of the nodes indicate the frequency of their
utilization within the registered mashups.
In our previous work [11] we proposed the Web API Rank
(WAR), a social indicator of the relevance of a Web API
based on the number of mashups that utilize it. In this
work, we reuse this indicator and we add a second one: the
Co-utilization API Rank (CAR). We argue that the selection
Figure 3: API collaboration network
process is iteratively performed and that the current selec-
tion depends on previously selected APIs. Then, the CAR
helps the composer to discover those APIs that have been
co-utilized before with the current selection set. This indi-
cator can also be restricted to a specific context, specifying
the global goal (which kind of mashup is to be constructed),
besides the local one (which kind of API is searching).
3. PROPOSAL
In this work, we present an approach to combine the se-
mantic and social networks to enrich and improve the Web
API discovery process in order to build a mashup. We re-
inforced this approach with a Web tool that exploits this
information. The innovation of the proposed model centers
on the combined use of these two networks for this problem.
The improved discovery process model proposed in this
paper is based on the analysis of (1) the descriptions of
Web APIs and mashups, and (2) the “composed-by” rela-
tions found on mashups (which Web APIs are combined to
create them). We construct two concept lattices, one to
generate a taxonomy for the Web APIs and the other to
generate a taxonomy for the mashups. These taxonomies
serve to (1) drive the semantic API discovery process and
(2) narrow this search only to those APIs that are being
used for a specific kind of mashup. We also build the social
network, in which nodes are Web APIs and edges represent
their joint usage within a mashup. In the following subsec-
tion we will explain the major stages of the approach and
how they are combined to enhance the discovery process.
3.1 Preprocessing Stage
In the preprocessing stage we have built an specialized
crawler that consumes the ProgrammableWeb’s API in or-
der to obtain the descriptions of Web APIs and mashups,
the Web APIs that conforms each mashup, ratings, tags,
and other data. At first, API’s and mashup’s tags seemed
good candidates to create the taxonomies, but they are too
specific and change between different catalogs, while descrip-
tions tend to be immutable.
We explain here the preprocessing stage for the API tax-
onomy, but it is analogous for the mashup taxonomy.
From the crawled descriptions we extract the set of to-
kens. Tokens require special handling because some of them
can be compound words. We identify them by locating up-
percase letters or underscore signs within the word. Then,
using the TreeTagger tool
2
we filter those tokens that are
not common nouns. We also filter stop words, typically re-
ferring to API or mashup names (e.g. Google), specific tech-
nologies (e.g. Python), formats (e.g. XML), protocols (e.g.
REST), etc. After token filtering, we apply the Porter stem-
ming algorithm [7] to normalize the terms. Then, for each
API, we obtain a vector of terms: ta
i
= {t
1
, ..., t
k
}. The
terms could be unigrams, bigrams or trigrams. From this
set of terms we must choose those relevant enough to repre-
sent the different objects. For this task, we used Term Fre-
quency/Inverse Document Frequency (TF/IDF). TF/IDF is
a common mechanism in Information Retrieval for generat-
ing a robust set of representative keywords from a corpus of
documents.
The TF of a term t
i
inside of an API A
i
is computed as:
tf(t
i
) =
freq(t
i
, A
i
)
|A
i
|
(1)
while the IDF is calculated as the ratio between the total
number of APIs and the number of APIs that contain the
term:
idf(t
i
) = log
|A|
|{A
i
: t
i
∈ A
i
|}
(2)
Then the TF/IDF weight of a term is calculated as:
w(t
i
) = tf(t
i
) × idf(t
i
) (3)
Regarding the social network, we extract from the crawled
data the set of APIs that were utilized to form each mashup.
3.2 Building the Taxonomies
As we mentioned before, when we build a taxonomy of
APIs we assume that each node is a sub-community or cat-
egory that provides some functionality. These communi-
ties are characterized by representative terms and popu-
lated with a set of APIs that share those functionalities.
A particular functionality can be provided by different sub-
communities, then the combination of the different func-
tionalities is what makes each community unique. Analo-
gously, we build a mashup taxonomy, in which mashups are
arranged as communities that provide capabilities of differ-
ent kind, also represented as a set of terms.
The aim to build both taxonomies is to support com-
posers, in one hand, to find similar mashups in terms of
2
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger
Figure 4: Long-tail distribution
the capabilities needed and, on the other hand, to find an
API that provides the required functionalities.
Because the lattice size can grow exponentially with re-
spect to the number of contexts [5] and people predomi-
nantly use two or three terms when they are searching in
web search engines
3
, we build lattices whose concepts have
no more than five representative terms and exhibit a sta-
bility over the 90%. Stability indicates the probability of
preserving a concept intent while removing some elements
from its extent.
To build these taxonomies, we crawled the descriptions of
APIs and mashups form ProgrammableWeb. The snapshot
represent the state of this catalog until May, 2011. We ob-
tained 3318 Web API and 5848 mashup descriptors. After
the preprocessing stage to build the taxonomies, we obtained
a set of 262 terms for Web APIs and a set of 192 terms for
mashups. With these sets of terms we built a context matrix
for each one, where the size of the matrix is the number of
terms by the number of objects (262 ×3318 and 192 ×5848,
respectively). Using the contexts, we built concept lattices
with maximum support of 0.5%, meaning that the concepts
will have no more than 5 terms. The concept lattices were
built using Coron System
4
with the Charm algorithm and
Naive to find the order. The times to generate these lat-
tices were less than one second. The number of concepts in
the lattices were 754 + 2 (inner nodes plus top and bottom
nodes) for Web APIs and 261 + 2 concepts for mashups.
3.3 Building the Social Network
From the catalog data, we crawled the information about
the usage of APIs within mashups and built a social net-
work. The topology of this network can give us insights
about previous decisions made by mashup composers. In [6]
the authors discovered that the distribution of APIs within
mashups follows a power law, implying that a small number
of APIs form the majority of mashups. This tendency re-
mains in the collected data, where only 23 APIs (less than
3%) covers 80% of the complete set of mashups. In Figure
4 we can see the characteristic long-tail of the distribution.
Extracting the information about which APIs conform
each mashup, we built a social network where nodes are
APIs and the edges link two APIs that were used together
in a mashup. For instance, the “See you Hotel” mashup was
3
http://www.keyworddiscovery.com/keyword-stats.html
4
http://coron.loria.fr/
created using four APIs: Flickr, Google Maps, Twitter and
YouTube. In the social network, this implies the creation
of four nodes (each one representing an API), and six links
interconnecting them. If there are other mashups using the
same APIs, then the weight of the links are getting stronger.
In Figure 3, we can appreciate a portion of the social net-
work for the downloaded dataset.
Then, for each API A
i
we calculate the global WAR as
the number of mashups in which A
i
is used. We rescaled
these values into the range [0, 1], by dividing all the WARs
by the maximum WAR of the APIs.
Given the information of the social network, it is possible
to calculate the Co-utilization API Rank (CAR) for a given
subset S of APIs with respect to an API A
i
:
CAR
S,A
i
=

n
S,A
i
n
S
, n
S
= 0;
0, n
S
= 0,
(4)
where n
S,A
i
is the number of mashups in which S and A
i
are
used together, and n
S
is the number of mashups in which S
appears.
3.4 Iterative Web API Discovery
In this section we describe how we exploit the social infor-
mation to improve the Web API discovery process. First, we
assume that mashup composers have good practices regard-
ing composite applications. Then, they have divided their
problem into a set of subproblems or functionalities that can
be satisfied by different Web APIs. Because discovering and
selecting APIs are iterative processes, at each step the com-
poser can constraint the current search with the decisions
already made. It is important to mention that even when
this approach is intended to support the mashup building
process it can be used also to discover specific APIs and/or
mashups. We can distinguish the composer intention by the
information supplied:
• CASE 1: If composer only provides mashup keywords,
we interpret it as he is trying to find mashups that
provide the specified capabilities.
• CASE 2: If composer only provides API keywords, we
interpret it as he is trying to find APIs that provides
the specified functionalities.
• CASE 3: If composer provides mashup and API key-
words, we interpret it as he is trying to find APIs that
provide the specified functionalities and that has been
used on a specific type of mashup. Besides, in later
stages, the composer could have a selected subset of
APIs that has to be considered as a constraint to the
new discovery process.
This process is enriched at each step with the social infor-
mation about which APIs have been previously used by the
community.
In Algorithm 1, we present how the discovery process is
driven according to the inputs of the composer. The seman-
tic rank is always calculated in one way but the social rank
is slightly different depending on (1) the already selected
APIs and (2) if the composer defines a mashup context (set
of keywords K
M
) for which the API is needed. If a context
is specified, both WAR and CAR are calculated not over the
entire set of mashups, but only over the extent of concept
C
M
(obtained from the set of keywords K
M
as explained in
2.1). These are the “local” versions.
Algorithm 1 Iterative Web API discovery
Require: Let M the set of all mashups.
Require: Let K
M
= {t
1
, ..., t
n
} the set of keywords that
define the type of mashup the composer wants to build.
Require: Let K
A
i
= {t
1
, ..., t
m
} the set of keywords that
define the type of API the composer searches at step i.
Require: Let I the number of APIs that will comprise the
mashup.
Require: Let S the initial empty set of selected APIs.
1: for i = 1 to I do
2: Remove stop words from K
A
i
3: Stem K
A
i
4: Using K
A
i
obtain the API category C
A
which intent
is closest to K
A
i
as explained in section 2.1
5: Get the APIs ∈ C
A
= {a
1
, ..., a
K
}
6: for k = 1 to K do
7: Calculate semantic rank R
k
given the frequency ma-
trix of the API - terms
8: if K
M
= ∅ then
9: Let n
k
the number of mashups m ∈ M in which
API
k
is used
10: Let n
max
= max
1≤k≤K
n
k
11: Calculate global WAR of API
k
as WAR
G
k
=
n
k
n
max
12: if S = ∅ then
13: Let n
S,k
the number of mashups m ∈ M in
which S and API
k
are used together
14: Let n
S
the number of mashups m ∈ M in
which S appears
15: Calculate global CAR of API
k
as in (4)
CAR
G
S,k
=

n
S,k
n
S
, n
S
= 0;
0, n
S
= 0,
16: Calculate the social rank of API
k
as SR
k
=
(WAR
G
k
+CAR
G
S,k
)
2
17: else
18: Calculate the social rank of API
k
as SR
k
=
WAR
G
k
19: end if
20: else
21: Using K
M
obtain the mashup concept C
M
which
intent is closest to K
M
as explained in section 2.1
22: Let n
k
the number of mashups m ∈ C
M
in which
API
k
is used
23: Let n
max
= max
1≤k≤K
n
k
24: Calculate local WAR of API
k
as WAR
L
k
=
n
k
n
max
25: if S = ∅ then
26: Let n
S,k
the number of mashups m ∈ C
M
in
which S and API
k
are used together
27: Let n
S
the number of mashups m ∈ C
M
in
which S appears
28: Calculate local CAR of API
k
as in (4)
CAR
L
S,k
=

n
S,k
n
S
, n
S
= 0;
0, n
S
= 0,
29: Calculate the social rank of API
k
as SR
k
=
(WAR
L
k
+CAR
L
S,k
)
2
30: else
31: Calculate the social rank of API
k
as SR
k
=
WAR
L
k
32: end if
33: end if
34: Calculate the final rank FR
k
of API
k
as FR
k
=
α · SR
k
+ (1 − α) · R
k
35: end for
36: The user selects one API, adding it to S, probably the
one with highest final rank FR
k
37: Suggest the set of APIs that have been co-utilized
with S. One of these APIs can also be selected at this
step, then i must be incremented in the number of
APIs that could be selected.
38: end for
Figure 5: MashupReco architecture
3.5 Implementation
In order to show empirically our results we have built the
MashupReco prototype web tool that allows composers to
perform an iterative API discovery process. Its architecture
is depicted in Figure 5. The crawler component is designed
to gather data from multiple catalogs. Currently, it only
supports the ProgrammableWeb catalog. The data is stored
in a MySQL engine database. Using the “Social Engine”and
the“Taxonomy Builder”we perform the social network anal-
ysis and generate the taxonomies. The Taxonomy Builder is
built over the Coron and the TreeTagger System. The most
important module is the “Mashup Discovery Engine” which
implements the iterative Web API discovery algorithm. Be-
cause the functionalities of MashupReco are exposed as web
services, they could be consumed by different applications
and build different presentations for it
5
. In Figure 5 we
show a basic interface to support composers in the discov-
ery process. The parameter α allows composer to calibrate
how much weight assign to the social influence. For αs closer
to 1 the composer gives more importance to the social influ-
ence, against αs closer to 0 meaning that the social influence
is less important.
4. CASE STUDY
In this section, we describe MashupReco with an experi-
ment. Here, the composer, a real estate broker, requires to
build a web site that mashes up different sources of informa-
tion regarding houses on sale given a specific location and
its perimeter. He is interested on displaying over a map the
different housing options, their photos and videos (if they
exist), photos and/or videos of near places of interest such
as schools, restaurants, fitness centers, to name a few. As-
suming that our composer user has good practices, he will
be able to identify which kind of APIs he needs. Actually, he
already identified that he needs a map to display and mash
up the different sources of information. He needs also APIs
capable of searching videos and photos at a specific loca-
tion. Probably, he needs an API to convert an address into
a latitude/longitude pair to obtain the photos and videos of
5
http://dev.toeska.cl/mashup-reco
interesting points as well as the housing options. He wants
to support his potential customers to get an impression of
the neighborhood where the house is located, then he also
needs an API that could extract information about what is
people saying about this place (probably comments from a
social network).
Now, the composer needs to find APIs to build the mashup.
Using MashupReco he first specifies the mashup context,
this is a mashup about “map” and “real estate”. Then,
• He searches an API to find geo-located photos using
the keywords “photo” and “location”. The results are
immediate:
– Ranked by the similarity technique: Glosk, In-
stagram Real-time and Steply are highly ranked
(0.97, 0.80 and 0.76, respectively). Using only
the social influence, he obtains Microsoft Virtual
Earth, Flickr, Yahoo Maps (with global WARs of
1.0, 0.8, and 0.62, respectively). Using the com-
bined ranking with an α of 0.3, Glosk and Flickr
are the highest ranked APIs (0.679 and 0.677 re-
spectively). As we can notice, their ranks are al-
most the same, then using one or the other seems
to be a good option. But the global WAR of Glosk
is 0 which means that it has never been used in a
mashup against the global WAR of Flickr of 0.8.
• Based on the previous results he decides to use Glosk.
Glosk does not have any Co-APIs, then MashupReco
cannot suggest APIs according to this criteria.
• Then, he searches video APIs using the keyword“video”.
– According to the combined rank, the APIs with
highest ranking are YouTube (0.78), Yahoo Video
Search (0.70) and Patrick’s Aviation (0.66).
• Then the user selects YouTube against Yahoo Video
Search (WARs of 0.84 and 0.01, respectively).
– Based on the previous selection, MashupReco com-
putes the list of Co-APIs along with their CARs,
containing Google Maps 0.86, Flickr 0.33, Twit-
ter 0.12, Weather Channel 0.10, Wikipedia 0.10,
Foursquare 0.07, Yahoo Geocoding 0.03, to name
a few.
• From the Co-APIs list, the composer selects Google
Maps as the mapping visualization, Twitter as the
source of what is people saying about the neighbor-
hood and Yahoo Geocoding as the API to obtain the
geo-location given an address. Each time the composer
selects an API, the Co-APIs list is recalculated.
Using MashupReco, we can balance results matching de-
scriptions and community usage of APIs. The majority of
APIs does not have social information to exploit because
only a fraction of them has been utilized in a mashup. This
lack of social data, can lead to a problem because there is no
way to rate an API based on its usage. On the other hand, if
there are APIs that have been extensively used in mashups,
exists the possibility of rating them too high and giving them
too much exposure, leaving the rest in the bottom of the list.
That is the reason behind the idea of influencing the discov-
ery process with social data, rather than basing on it. This
is controlled by the alpha factor. For example, for the query
“photo location”, a highly used API such as Flickr appears
under Glosk, an API that has not been used in any mashup,
but is more specific to the query.
The Co-APIs shows a list of APIs that has been used
in collaboration with the selected ones in mashups of the
context. In the case of selecting Youtube, APIs of different
functionalities are suggested: Google Maps, Geonames and
Twitter are some of them. Based on the context, these APIs
could be useful for the mashup under construction based on
previous compositions made by other users.
5. RELATED WORK
Given the increasing trend of major firms providing APIs
for public use, mashup community is rapidly expanding.
There are studies that characterize the mashup ecosystem as
a API-Mashup network [10] which intended to exploit this
information.
In [8], the authors proposed the serviut score to rank APIs
based on their utilization and popularity. To calculate the
serviut score they also considered the number of mashups
that use the given API but also other aspects that we be-
lieve are too ambiguous to be considered, such as classifying
mashups in the same category as the API. Even according
to ProgrammableWeb, mashups are not classified in cate-
gories because by definition a mashup is a mix of different
Web APIs, therefore is quite difficult to classify them in
functional categories. According to our experiments, the
taxonomy of APIs and mashups are quite different.
In [3], authors proposed a social technique to mine an-
notated tags of mashups and APIs in order to recommend
mashup candidates managing the cold start problem for new
competitors. But tags are not reusable between different
catalogs. Then, by using tags we obtain specific taxonomies
that are not generic enough to be used inter-sites. Web API
authors do not necessarily use the same tags to describe their
APIs, they typically adapt them according to the tags that
are used on each catalog.
In [2], authors proposed MashupAdvisor, that also assist
mashup creators to build mashups. Similar to our approach,
MashupAdvisor suggests APIs that could be part of the
mashup under construction using a probabilistic approach
based on the popularity in the mashup repository. But be-
cause MashupAdvisor assists the mashup building process
instead of only the selection, this approach is based on spe-
cific inputs and outputs. But typically only Web service
APIs have this data. Mostly because of their complexity
and lack of standards, general APIs do not have interface in-
formation of each operation. Then, this approach performs
well over Web services but not over general Web APIs. Even
when the results are encouraging they actually simulate the
data of ProgrammableWeb to conduct the experiments.
In [13], authors proposed ServiceRank to differentiate ser-
vices from a set of functional-equivalent services based on
their quality and social aspects. The problem is that it needs
to access data that maybe providers are not willing to give,
such as the response time and availability measurements.
Also, because providers publish their own measurements,
this process could be not completely reliable.
In [4] authors proposed MatchUp, a tool that supports
mashup creators to locate components to mash up based on
the current component selection and a complete database
that describes which components have been used in the dif-
ferent mashups (at level of input/output). The algorithm
performs well but is only feasible at level of intra organiza-
tion because, in general, this information is not shared or
public.
6. FUTURE WORK
Every day, at least two new APIs are created. The mar-
ket is also changing according to the needs of customers.
Therefore, is expected that the communities already identi-
fied change their structure, new APIs (or mashups) join the
communities or leave them. The evolution is imminent, then
over time we expect that some of these communities merge
or split into more specialized sub-communities. We are cur-
rently working on determining evolution patterns using this
community abstraction, as well as modeling this evolution.
Also, the social ranks (WAR and CAR) are affected with
this dynamism, and have to reflect variations on the usage
of APIs, e.g. APIs with an intense use in a short period
of time and then experimenting a decrease. On the other
hand, we are researching techniques that will allow us to
incrementally update the taxonomy each time the commu-
nities change enough to trigger a taxonomy adaptation.
7. CONCLUSIONS
In this work, we have presented an approach that com-
bines both the semantic and social networks to enrich and
improve the Web API discovery process in order to build a
mashup. We have shown empirically that using natural lan-
guage descriptions of the objects can be used effectively to
build taxonomies of functionalities, presented as communi-
ties. Also, we have shown how we can build a collaborative
social network of APIs by using the analogy of agents that
collaborate to create applications, and also how we can ex-
ploit and leverage the semantic ratings.
One of our main contributions was to reinforce this method-
ology with a Web tool that allows us to empirically show
this iterative process. This approach shows how we can
effectively mitigate the cold start problem and the prefer-
ential attachment trend of social approaches to recommend
APIs or mashups, and also how we effectively discover bet-
ter description-based candidates by leveraging social infor-
mation making a trade off between both worlds.
8. ACKNOWLEDGMENTS
This work was partially funded by FONDEF (grant D08i1155),
UTFSM DGIP (grant DGIP 241167 and PIIC) and CCTVal
(FB/22HA/10).
9. REFERENCES
[1] C. Carpineto and G. Romano. Exploiting the potential
of concept lattices for information retrieval with credo.
J. UCS, pages 985–1013, 2004.
[2] H. Elmeleegy, A. Ivan, R. Akkiraju, and R. Goodwin.
Mashup advisor: A recommendation tool for mashup
development. In Web Services, 2008. ICWS ’08. IEEE
International Conference on, pages 337 –344, sept.
2008.
[3] K. Goarany, G. Kulczycki, and M. B. Blake. Mining
social tags to predict mashup patterns. In Proceedings
of the 2nd international workshop on Search and
mining user-generated contents, SMUC ’10, pages
71–78, New York, NY, USA, 2010. ACM.
[4] O. Greenshpan, T. Milo, and N. Polyzotis.
Autocompletion for mashups. Proc. VLDB Endow.,
2:538–549, August 2009.
[5] C. Lindig. Fast concept analysis. In Working with
Conceptual Structures Contributions to ICCS 2000,
pages 152–161. Shaker Verlag, 2000.
[6] G. R. G. Michael Weiss. Modeling the mashup
ecosystem: Structure and growth. In R & D
Management, pages 40–49, 2010.
[7] M. F. Porter. An algorithm for suffix stripping.
Program, 14(3):130–137, 1980.
[8] A. Ranabahu, M. Nagarajan, A. P. Sheth, and
K. Verma. A faceted classification based approach to
search and rank web apis. In Proceedings of the 2008
IEEE International Conference on Web Services,
pages 177–184, Washington, DC, USA, 2008. IEEE
Computer Society.
[9] C. Roth and P. Bourgine. Lattice-based dynamic and
overlapping taxonomies: The case of epistemic
communities. Scientometrics, 69(2):429–447, 2006.
[10] J. W. Shuli Yu. Innovation in the programmable web:
Characterizing the mashup ecosystem. In ICSOC
2008, LNCS 5472, pages 136–147. Springer-Verlag,
2009.
[11] R. Torres, B. Tapia, and H. Astudillo. Improving web
api discovery by leveraging social information. to
appear in the proceedings of the 9th IEEE
International Conference on Web Services, 2011.
[12] M. Weiss and S. Sari. Evolution of the mashup
ecosystem by copying. In Proceedings of the 3rd and
4th International Workshop on Web APIs and
Services Mashups, Mashups ’09/’10, pages 11:1–11:7,
New York, NY, USA, 2010. ACM.
[13] Q. Wu, A. Iyengar, R. Subramanian, I. Rouvellou,
I. Silva-Lepe, and T. Mikalsen. Combining quality of
service and social information for ranking services. In
Proceedings of the 7th International Joint Conference
on Service-Oriented Computing, ICSOC-ServiceWave
’09, pages 561–575, Berlin, Heidelberg, 2009.
Springer-Verlag.