04 - Simplifying Mashup Component Selection With A Combined Similarity - and Social-Based Technique (Tapia, Torres, Astudillo MASHUPS 2011)

Simplifying mashup component selection with a combined
similarity- and social-based technique
Boris Tapia Romina Torres Hernán Astudillo

Universidad Técnica Federico Universidad Técnica Federico Universidad Técnica Federico
Santa María Santa María Santa María
Departamento de Informática Departamento de Informática Departamento de Informática
Valparaíso, Chile Valparaíso, Chile Valparaíso, Chile
btapia@alumnos.inf.utfsm.cl romina@inf.utfsm.cl hernan@inf.utfsm.cl
ABSTRACT by external sources. Today, the way that people build soft-
Web mashups are becoming the main approach to build ware has radically changed, from implementing from scratch
Web applications. Current approaches to enable compo- to searching for already packed functionality that alone or
nent selection include description-based techniques and so- combined, fully or partially satisfies each requirement.
cially generated metadata. The explosive growth of APIs API Web catalogs provide, besides documentation, social
makes increasingly harder selecting appropriate components information about the real use of APIs on the registered
for each mashup. Unfortunately, description-based tech- mashups. In our previous work [11], we have argued the
niques rely heavily on the quality of authors’ information, need to combine these two sources of information, where
and social-based approaches suffer problems like “cold-start” description-based techniques can be leveraged by social in-
and “preferential attachment”. This article proposes (1) formation. The results of this combination, allows to dis-
two new measures of socially ranked fitness of candidate cover candidates that would have passed unnoticed because
components, (2) an API functional taxonomy using For- of their poor quality descriptions or their low popularity. It
mal Concept Analysis based on descriptions, and (3) a com- is important to mention that we cannot use only the social
bined approach that improves description-based techniques information, because as has been showed in previous work
with these social ranking measures. We use social rank- [3] this leads to the cold start problem for new APIs, and
ings based on past (co-)utilization of APIs: WAR (Web API makes the discovery process to exhibit a preferential attach-
Rank) measures API utilization over time, and CAR (Co- ment trend.
utilization API Rank) measures its co-utilization with other Typically, mashups are built with more than one API.
APIs. The measures and the combined approach are illus- These APIs are iteratively selected and previous selections
trated with a case study using the well-known Web APIs influence current ones. Then, when composers are discover-
catalog ProgrammableWeb 1 . A prototype tool allows it- ing APIs, they must consider which other APIs they have
erative discovery of APIs and assists the mashup creation already selected. Developers community can help composers
process. to discover the most appropriate APIs based on their past
co-utilization. This information is obtained from the regis-
tered mashups and we can exploit it to support the discovery
Keywords process.
mashup, formal concept analysis, recommendation system, Currently, most techniques have tackled the discovery pro-
social network cess using separately social and semantic networks. In this
work, we present an approach to use both networks to enrich
1. INTRODUCTION and improve the discovery process results.
Our main contributions presented in this paper are
Mashups are becoming the de facto approach to build
customer-oriented Web applications, by combining opera- • a novel iterative approach to discover APIs to build
tions on several Web APIs (Application Programming In- mashups driven by social utilization and co-utilization
terfaces) into a single lightweight, rich, customized Web over semantic techniques,
front-end. They allow to construct complete applications by
searching, composing and executing functionality provided • a new model to represent APIs and their collaboration
1
to create mashups,
http://www.programmableweb.com
• a web tool that support this approach and
• the evaluation of the proposed approach through a case

Permission to make digital or hard copies of all or part of this work for study.
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies The paper is organized as follows. Section 2 introduces the
bear this notice and the full citation on the first page. To copy otherwise, to formal framework needed for representing both networks.
republish, to post on servers or to redistribute to lists, requires prior specific Section 3 describes our approach. Section 4 exemplifies our
permission and/or a fee.
Mashups 2011, September 2011; Lugano, Switzerland approach with a case study. Section 5 briefly describes re-
Copyright 2011 ACM 978-1-4503-0823-6/11/09 ...$10.00. lated work to highlight the challenges of this research. Sec-
Table 1: Formal context example
Agent vs Terms T1 T2 T3 T4
A1 x x x x
A2 x x
A3 x x
A4 x
sented as a graph, where each node is a sub-community of

Figure 1: API-Mashup and semantic networks functionalities with several APIs anchored to it. Function-
alities are extracted from the natural language descriptions
of APIs. Inspired on [9] we consider a binary relation R be-
tion 6 discusses current issues with this approach and possi-
tween an API set A and a concept set C. R expresses the
ble solutions, and finally, Section 7 gives concluding remarks.
fact that an API a provides the functionality represented by
c in some mashup. Then an API community (AC) is the
2. SOCIAL AND SEMANTIC NETWORKS largest set of APIs A (extent) sharing a given set of func-
Mashups are interesting because they express how APIs tionalities C (intent).
can be combined to generate new and innovative systems. In our previous work, we proposed Growing Hierarchical
APIs can be represented and distinguished between them by Self-organizing Maps (GHSOM) to model this structure [11].
the functionality they provide. And these functionalities can GHSOM can assign an API to only one node of the hierar-
be represented by the semantic meaning of the concepts that chy. However, assuming that an API belongs to only one
describe them. In Figure 1, the API-Mashup network shows sub-community does not seem appropriate. Then, we pro-
the relationship between APIs and mashups, as well as the pose to organize functionalities in a lattice-based structure,
relationship between the APIs and their most representative which allows communities overlap and therefore APIs can
terms. be classified simultaneously on different sub-communities.
Due to the increasing proliferation of APIs, we can assume In order to represent ACs hierarchically in a lattice-based
that for each API there is a set of functional-equivalent APIs taxonomy we need to provide a partial order between ACs.
that can be interchangeable with them. Lately, the mashup Formally, an AC has an extent E (the set of APIs that con-
field has had many advances and contributors [10] and we forms the community) and an intent I (the set of function-
can conclude that we are facing a complete and specialized alities that the community shares). A community ACi is a
community of agents (APIs) that provide different function- sub-community of ACj iff the intention Ii is a subset of Ij .
alities represented by meaning terms. Then, the lattice is exactly the ordered set of ACs built from
As humans, newcomers learn by example, and as program- A, C and R.
mers, mashup composers also learn from other composers by Consider the following example. We have a set of APIs A1:
imitating previous decisions about which APIs to use (see TwitterVision, A2: Twitter, A3: Rumble and A4: Google
[6] and [12]). In other words, when composers are starting to Maps, which expose different functionalities represented by
build a mashup, they tend to search for similar mashups and the terms T1: deadpool, T2: microblogging, T3: social and
their structures as the kickoff of their own projects. There- T4: mapping. Table 1 shows which functionalities provide
fore, while more mashups had selected an API in the past, each API. We call this table, the context that is represented
the more likely it will be selected as part of a new mashup. by the set of relations R between agents and terms, where
This social influence makes the current API discovery be- Ri,j is blank if the functionality j is not provided by the
have as a preferential attachment process, which brings big agent i.
issues as the cold start problem for new competitors. Seeing lattice concepts as communities that provide func-
Both semantic and social approaches are valid and allow tionalities, the intent of each concept is the set of terms,
to discover APIs. Each one with its own issues. Seman- and the extent the set of APIs. Then, the APIs taxonomy
tic networks rely only on “promises”, they do not consider is built from the set of couples (A, C) as shown in Figure
the power of social information discovering APIs, and Social 2. The hierarchy is drawn according to the partial order <,
networks could hide better candidates because its nature is i.e. the bottom concept < top concept. We can appreciate
to encourage what is already at the top. In the next subsec- the lattice in a visual representation, where we can see at
tions we will formalize separately each network, and in the the top concept, the whole dataset. Because we are consid-
proposal section we will explain how they are combined and ering APIs of any kind of functionality, we find the top con-
empower each other. cept empty. On the medium-level, we find formal concepts
as the set {TwitterVision, Twitter, Rumble}, that provides
2.1 The semantic network “social” functionality, or the set {TwitterVision, Rumble,
Because each API is described according to the function- GoogleMaps}, that provides “mapping” functionality. If we
alities provided, it is possible to identify clusters of function- browse deeper we can see concepts as the set {TwitterVi-
alities shared by several APIs. Therefore, we model the set sion, Twitter} that provides two functionalities at the same
of APIs as multiple subsets or sub-communities that can be time: “microblogging” and “social”, or the set {TwitterVi-
hierarchically arranged according to the functionalities they sion, Rumble}, that provides “social” and “mapping” func-
share and the relation (general/specific) between them. Our tionalities. At the bottom level, and for this particular small
objective is to obtain a full functionality taxonomy repre- dataset, we find one Web API that provides all the function-
Figure 2: API taxonomy example showing the ex-
tent of concepts
alities at the same time: “social”, “mapping”, “deadpool” and

“microblogging”, and actually is the only one that provides
“deadpool” functionality.
When a composer is searching for an API, he is actu-
ally searching for a set of functionalities. To represent this
search, he uses a set of keywords. This query is transformed Figure 3: API collaboration network
into a virtual concept which intent is the set of keywords
[1]. Then, we navigate the lattice in order to find the con-
cept that has exactly the same intent in order to retrieve its process is iteratively performed and that the current selec-
extent. If there is no concept that matches the intent, the tion depends on previously selected APIs. Then, the CAR
virtual concept must be arranged within the lattice. From helps the composer to discover those APIs that have been
the potential set of parents of this virtual node (nodes which co-utilized before with the current selection set. This indi-
intent is a subset of the virtual concept’s intent) we select cator can also be restricted to a specific context, specifying
the concept(s) whose intent is maximum, and then we sug- the global goal (which kind of mashup is to be constructed),
gest the extent of that concept(s). Similar to [1], we walk the besides the local one (which kind of API is searching).
semantic graph from the virtual concept to their ancestors.
The distance from a child node and their direct parents is
one. Then the nodes with minimal distance to the virtual 3. PROPOSAL
node are the best candidates to be recommended. In this In this work, we present an approach to combine the se-
work, we are only considering nodes with distance of one, mantic and social networks to enrich and improve the Web
but it is possible to enrich the searching process by walking API discovery process in order to build a mashup. We re-
the graph at longer distances. inforced this approach with a Web tool that exploits this
information. The innovation of the proposed model centers
2.2 The social network on the combined use of these two networks for this problem.
We can see APIs as agents interacting between them to The improved discovery process model proposed in this
collaboratively create a new application. We represent these paper is based on the analysis of (1) the descriptions of
interactions as a social network conformed by the set of APIs Web APIs and mashups, and (2) the “composed-by” rela-
A and the set of undirected links E between them. Figure tions found on mashups (which Web APIs are combined to
3 depicts the collaboration network of APIs. The thickness create them). We construct two concept lattices, one to
of the edges indicate the frequency of these collaborations, generate a taxonomy for the Web APIs and the other to
and the size of the nodes indicate the frequency of their generate a taxonomy for the mashups. These taxonomies
utilization within the registered mashups. serve to (1) drive the semantic API discovery process and
In our previous work [11] we proposed the Web API Rank (2) narrow this search only to those APIs that are being
(WAR), a social indicator of the relevance of a Web API used for a specific kind of mashup. We also build the social
based on the number of mashups that utilize it. In this network, in which nodes are Web APIs and edges represent
work, we reuse this indicator and we add a second one: the their joint usage within a mashup. In the following subsec-
Co-utilization API Rank (CAR). We argue that the selection tion we will explain the major stages of the approach and
how they are combined to enhance the discovery process.
3.1 Preprocessing Stage

In the preprocessing stage we have built an specialized
crawler that consumes the ProgrammableWeb’s API in or-
der to obtain the descriptions of Web APIs and mashups,
the Web APIs that conforms each mashup, ratings, tags,
and other data. At first, API’s and mashup’s tags seemed
good candidates to create the taxonomies, but they are too
specific and change between different catalogs, while descrip-
tions tend to be immutable.
We explain here the preprocessing stage for the API tax-
onomy, but it is analogous for the mashup taxonomy.
From the crawled descriptions we extract the set of to-
kens. Tokens require special handling because some of them
can be compound words. We identify them by locating up- Figure 4: Long-tail distribution
percase letters or underscore signs within the word. Then,
using the TreeTagger tool 2 we filter those tokens that are
not common nouns. We also filter stop words, typically re- the capabilities needed and, on the other hand, to find an
ferring to API or mashup names (e.g. Google), specific tech- API that provides the required functionalities.
nologies (e.g. Python), formats (e.g. XML), protocols (e.g. Because the lattice size can grow exponentially with re-
REST), etc. After token filtering, we apply the Porter stem- spect to the number of contexts [5] and people predomi-
ming algorithm [7] to normalize the terms. Then, for each nantly use two or three terms when they are searching in
API, we obtain a vector of terms: tai = {t1 , ..., tk }. The web search engines 3 , we build lattices whose concepts have
terms could be unigrams, bigrams or trigrams. From this no more than five representative terms and exhibit a sta-
set of terms we must choose those relevant enough to repre- bility over the 90%. Stability indicates the probability of
sent the different objects. For this task, we used Term Fre- preserving a concept intent while removing some elements
quency/Inverse Document Frequency (TF/IDF). TF/IDF is from its extent.
a common mechanism in Information Retrieval for generat- To build these taxonomies, we crawled the descriptions of
ing a robust set of representative keywords from a corpus of APIs and mashups form ProgrammableWeb. The snapshot
documents. represent the state of this catalog until May, 2011. We ob-
The TF of a term ti inside of an API Ai is computed as: tained 3318 Web API and 5848 mashup descriptors. After
the preprocessing stage to build the taxonomies, we obtained
f req(ti , Ai ) a set of 262 terms for Web APIs and a set of 192 terms for
tf (ti ) = (1)
|Ai | mashups. With these sets of terms we built a context matrix
for each one, where the size of the matrix is the number of
while the IDF is calculated as the ratio between the total
terms by the number of objects (262 × 3318 and 192 × 5848,
number of APIs and the number of APIs that contain the
respectively). Using the contexts, we built concept lattices
term:
with maximum support of 0.5%, meaning that the concepts
|A| will have no more than 5 terms. The concept lattices were
idf (ti ) = log (2)
|{Ai : ti ∈ Ai |} built using Coron System 4 with the Charm algorithm and
Naive to find the order. The times to generate these lat-
Then the TF/IDF weight of a term is calculated as:
tices were less than one second. The number of concepts in
w(ti ) = tf (ti ) × idf (ti ) (3) the lattices were 754 + 2 (inner nodes plus top and bottom
nodes) for Web APIs and 261 + 2 concepts for mashups.
Regarding the social network, we extract from the crawled
data the set of APIs that were utilized to form each mashup. 3.3 Building the Social Network
From the catalog data, we crawled the information about
3.2 Building the Taxonomies the usage of APIs within mashups and built a social net-
As we mentioned before, when we build a taxonomy of work. The topology of this network can give us insights
APIs we assume that each node is a sub-community or cat- about previous decisions made by mashup composers. In [6]
egory that provides some functionality. These communi- the authors discovered that the distribution of APIs within
ties are characterized by representative terms and popu- mashups follows a power law, implying that a small number
lated with a set of APIs that share those functionalities. of APIs form the majority of mashups. This tendency re-
A particular functionality can be provided by different sub- mains in the collected data, where only 23 APIs (less than
communities, then the combination of the different func- 3%) covers 80% of the complete set of mashups. In Figure
tionalities is what makes each community unique. Analo- 4 we can see the characteristic long-tail of the distribution.
gously, we build a mashup taxonomy, in which mashups are Extracting the information about which APIs conform
arranged as communities that provide capabilities of differ- each mashup, we built a social network where nodes are
ent kind, also represented as a set of terms. APIs and the edges link two APIs that were used together
The aim to build both taxonomies is to support com- in a mashup. For instance, the “See you Hotel” mashup was
posers, in one hand, to find similar mashups in terms of 3
http://www.keyworddiscovery.com/keyword-stats.html
2 4
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger http://coron.loria.fr/
created using four APIs: Flickr, Google Maps, Twitter and Algorithm 1 Iterative Web API discovery
YouTube. In the social network, this implies the creation Require: Let M the set of all mashups.
of four nodes (each one representing an API), and six links Require: Let KM = {t1 , ..., tn } the set of keywords that
interconnecting them. If there are other mashups using the define the type of mashup the composer wants to build.
same APIs, then the weight of the links are getting stronger. Require: Let KAi = {t1 , ..., tm } the set of keywords that
define the type of API the composer searches at step i.
In Figure 3, we can appreciate a portion of the social net- Require: Let I the number of APIs that will comprise the
work for the downloaded dataset. mashup.
Then, for each API Ai we calculate the global WAR as Require: Let S the initial empty set of selected APIs.
the number of mashups in which Ai is used. We rescaled 1: for i = 1 to I do
these values into the range [0, 1], by dividing all the WARs 2: Remove stop words from KAi
by the maximum WAR of the APIs. 3: Stem KAi
4: Using KAi obtain the API category CA which intent
Given the information of the social network, it is possible is closest to KAi as explained in section 2.1
to calculate the Co-utilization API Rank (CAR) for a given 5: Get the APIs ∈ CA = {a1 , ..., aK }
subset S of APIs with respect to an API Ai : 6: for k = 1 to K do
nS,Ai 7: Calculate semantic rank Rk given the frequency ma-
nS
, nS 6= 0; trix of the API - terms
CARS,Ai = (4)
0, nS = 0, 8: if KM = ∅ then
9: Let nk the number of mashups m ∈ M in which
where nS,Ai is the number of mashups in which S and Ai are AP Ik is used
used together, and nS is the number of mashups in which S 10: Let nmax = max1≤k≤K nk
appears. 11: Calculate global WAR of AP Ik as W ARkG =
nk
nmax
3.4 Iterative Web API Discovery 12: if S 6= ∅ then
In this section we describe how we exploit the social infor- 13: Let nS,k the number of mashups m ∈ M in
mation to improve the Web API discovery process. First, we which S and AP Ik are used together
assume that mashup composers have good practices regard- 14: Let nS the number of mashups m ∈ M in
ing composite applications. Then, they have divided their which S appears
15: Calculate global CAR of AP Ik as in (4)
problem into a set of subproblems or functionalities that can nS,k
G , nS 6= 0;
be satisfied by different Web APIs. Because discovering and CARS,k = nS
0, nS = 0,
selecting APIs are iterative processes, at each step the com-
16: Calculate the social rank of AP Ik as SRk =
poser can constraint the current search with the decisions G
(W ARk G
+CARS,k )
already made. It is important to mention that even when 2
this approach is intended to support the mashup building 17: else
process it can be used also to discover specific APIs and/or 18: Calculate the social rank of AP Ik as SRk =
W ARkG
mashups. We can distinguish the composer intention by the 19: end if
information supplied: 20: else
• CASE 1: If composer only provides mashup keywords, 21: Using KM obtain the mashup concept CM which
intent is closest to KM as explained in section 2.1
we interpret it as he is trying to find mashups that 22: Let nk the number of mashups m ∈ CM in which
provide the specified capabilities. AP Ik is used
23: Let nmax = max1≤k≤K nk
• CASE 2: If composer only provides API keywords, we nk
24: Calculate local WAR of AP Ik as W ARkL = nmax
interpret it as he is trying to find APIs that provides
the specified functionalities. 25: if S 6= ∅ then
26: Let nS,k the number of mashups m ∈ CM in
• CASE 3: If composer provides mashup and API key- which S and AP Ik are used together
27: Let nS the number of mashups m ∈ CM in
words, we interpret it as he is trying to find APIs that which S appears
provide the specified functionalities and that has been 28: Calculate local CAR of AP Ik as in (4)
used on a specific type of mashup. Besides, in later nS,k
, nS 6= 0;
L
stages, the composer could have a selected subset of CARS,k = nS
0, nS = 0,
APIs that has to be considered as a constraint to the 29: Calculate the social rank of AP Ik as SRk =
new discovery process. L
(W ARk L
+CARS,k )
2
This process is enriched at each step with the social infor- 30: else
mation about which APIs have been previously used by the 31: Calculate the social rank of AP Ik as SRk =
community. W ARkL
In Algorithm 1, we present how the discovery process is 32: end if
33: end if
driven according to the inputs of the composer. The seman- 34: Calculate the final rank F Rk of AP Ik as F Rk =
tic rank is always calculated in one way but the social rank α · SRk + (1 − α) · Rk
is slightly different depending on (1) the already selected 35: end for
APIs and (2) if the composer defines a mashup context (set 36: The user selects one API, adding it to S, probably the
of keywords KM ) for which the API is needed. If a context one with highest final rank F Rk
is specified, both WAR and CAR are calculated not over the 37: Suggest the set of APIs that have been co-utilized
with S. One of these APIs can also be selected at this
entire set of mashups, but only over the extent of concept step, then i must be incremented in the number of
CM (obtained from the set of keywords KM as explained in APIs that could be selected.
2.1). These are the “local” versions. 38: end for
interesting points as well as the housing options. He wants
to support his potential customers to get an impression of
the neighborhood where the house is located, then he also
needs an API that could extract information about what is
people saying about this place (probably comments from a
social network).
Now, the composer needs to find APIs to build the mashup.
Using MashupReco he first specifies the mashup context,
this is a mashup about “map” and “real estate”. Then,
• He searches an API to find geo-located photos using
the keywords “photo” and “location”. The results are
immediate:
– Ranked by the similarity technique: Glosk, In-

stagram Real-time and Steply are highly ranked
(0.97, 0.80 and 0.76, respectively). Using only
the social influence, he obtains Microsoft Virtual
Earth, Flickr, Yahoo Maps (with global WARs of
1.0, 0.8, and 0.62, respectively). Using the com-
Figure 5: MashupReco architecture bined ranking with an α of 0.3, Glosk and Flickr
are the highest ranked APIs (0.679 and 0.677 re-
spectively). As we can notice, their ranks are al-
3.5 Implementation most the same, then using one or the other seems
In order to show empirically our results we have built the to be a good option. But the global WAR of Glosk
MashupReco prototype web tool that allows composers to is 0 which means that it has never been used in a
perform an iterative API discovery process. Its architecture mashup against the global WAR of Flickr of 0.8.
is depicted in Figure 5. The crawler component is designed
to gather data from multiple catalogs. Currently, it only • Based on the previous results he decides to use Glosk.
supports the ProgrammableWeb catalog. The data is stored Glosk does not have any Co-APIs, then MashupReco
in a MySQL engine database. Using the “Social Engine” and cannot suggest APIs according to this criteria.
the “Taxonomy Builder” we perform the social network anal-
• Then, he searches video APIs using the keyword “video”.
ysis and generate the taxonomies. The Taxonomy Builder is
built over the Coron and the TreeTagger System. The most – According to the combined rank, the APIs with
important module is the “Mashup Discovery Engine” which highest ranking are YouTube (0.78), Yahoo Video
implements the iterative Web API discovery algorithm. Be- Search (0.70) and Patrick’s Aviation (0.66).
cause the functionalities of MashupReco are exposed as web
services, they could be consumed by different applications • Then the user selects YouTube against Yahoo Video
and build different presentations for it 5 . In Figure 5 we Search (WARs of 0.84 and 0.01, respectively).
show a basic interface to support composers in the discov-
ery process. The parameter α allows composer to calibrate – Based on the previous selection, MashupReco com-
how much weight assign to the social influence. For αs closer putes the list of Co-APIs along with their CARs,
to 1 the composer gives more importance to the social influ- containing Google Maps 0.86, Flickr 0.33, Twit-
ence, against αs closer to 0 meaning that the social influence ter 0.12, Weather Channel 0.10, Wikipedia 0.10,
is less important. Foursquare 0.07, Yahoo Geocoding 0.03, to name
a few.
4. CASE STUDY • From the Co-APIs list, the composer selects Google
In this section, we describe MashupReco with an experi- Maps as the mapping visualization, Twitter as the
ment. Here, the composer, a real estate broker, requires to source of what is people saying about the neighbor-
build a web site that mashes up different sources of informa- hood and Yahoo Geocoding as the API to obtain the
tion regarding houses on sale given a specific location and geo-location given an address. Each time the composer
its perimeter. He is interested on displaying over a map the selects an API, the Co-APIs list is recalculated.
different housing options, their photos and videos (if they
Using MashupReco, we can balance results matching de-
exist), photos and/or videos of near places of interest such
scriptions and community usage of APIs. The majority of
as schools, restaurants, fitness centers, to name a few. As-
APIs does not have social information to exploit because
suming that our composer user has good practices, he will
only a fraction of them has been utilized in a mashup. This
be able to identify which kind of APIs he needs. Actually, he
lack of social data, can lead to a problem because there is no
already identified that he needs a map to display and mash
way to rate an API based on its usage. On the other hand, if
up the different sources of information. He needs also APIs
there are APIs that have been extensively used in mashups,
capable of searching videos and photos at a specific loca-
exists the possibility of rating them too high and giving them
tion. Probably, he needs an API to convert an address into
too much exposure, leaving the rest in the bottom of the list.
a latitude/longitude pair to obtain the photos and videos of
That is the reason behind the idea of influencing the discov-
5
http://dev.toeska.cl/mashup-reco ery process with social data, rather than basing on it. This
is controlled by the alpha factor. For example, for the query ferent mashups (at level of input/output). The algorithm
“photo location”, a highly used API such as Flickr appears performs well but is only feasible at level of intra organiza-
under Glosk, an API that has not been used in any mashup, tion because, in general, this information is not shared or
but is more specific to the query. public.
The Co-APIs shows a list of APIs that has been used
in collaboration with the selected ones in mashups of the 6. FUTURE WORK
context. In the case of selecting Youtube, APIs of different Every day, at least two new APIs are created. The mar-
functionalities are suggested: Google Maps, Geonames and ket is also changing according to the needs of customers.
Twitter are some of them. Based on the context, these APIs Therefore, is expected that the communities already identi-
could be useful for the mashup under construction based on fied change their structure, new APIs (or mashups) join the
previous compositions made by other users. communities or leave them. The evolution is imminent, then
over time we expect that some of these communities merge
5. RELATED WORK or split into more specialized sub-communities. We are cur-
Given the increasing trend of major firms providing APIs rently working on determining evolution patterns using this
for public use, mashup community is rapidly expanding. community abstraction, as well as modeling this evolution.
There are studies that characterize the mashup ecosystem as Also, the social ranks (WAR and CAR) are affected with
a API-Mashup network [10] which intended to exploit this this dynamism, and have to reflect variations on the usage
information. of APIs, e.g. APIs with an intense use in a short period
In [8], the authors proposed the serviut score to rank APIs of time and then experimenting a decrease. On the other
based on their utilization and popularity. To calculate the hand, we are researching techniques that will allow us to
serviut score they also considered the number of mashups incrementally update the taxonomy each time the commu-
that use the given API but also other aspects that we be- nities change enough to trigger a taxonomy adaptation.
lieve are too ambiguous to be considered, such as classifying
mashups in the same category as the API. Even according 7. CONCLUSIONS
to ProgrammableWeb, mashups are not classified in cate- In this work, we have presented an approach that com-
gories because by definition a mashup is a mix of different bines both the semantic and social networks to enrich and
Web APIs, therefore is quite difficult to classify them in improve the Web API discovery process in order to build a
functional categories. According to our experiments, the mashup. We have shown empirically that using natural lan-
taxonomy of APIs and mashups are quite different. guage descriptions of the objects can be used effectively to
In [3], authors proposed a social technique to mine an- build taxonomies of functionalities, presented as communi-
notated tags of mashups and APIs in order to recommend ties. Also, we have shown how we can build a collaborative
mashup candidates managing the cold start problem for new social network of APIs by using the analogy of agents that
competitors. But tags are not reusable between different collaborate to create applications, and also how we can ex-
catalogs. Then, by using tags we obtain specific taxonomies ploit and leverage the semantic ratings.
that are not generic enough to be used inter-sites. Web API One of our main contributions was to reinforce this method-
authors do not necessarily use the same tags to describe their ology with a Web tool that allows us to empirically show
APIs, they typically adapt them according to the tags that this iterative process. This approach shows how we can
are used on each catalog. effectively mitigate the cold start problem and the prefer-
In [2], authors proposed MashupAdvisor, that also assist ential attachment trend of social approaches to recommend
mashup creators to build mashups. Similar to our approach, APIs or mashups, and also how we effectively discover bet-
MashupAdvisor suggests APIs that could be part of the ter description-based candidates by leveraging social infor-
mashup under construction using a probabilistic approach mation making a trade off between both worlds.
based on the popularity in the mashup repository. But be-
cause MashupAdvisor assists the mashup building process
instead of only the selection, this approach is based on spe-
8. ACKNOWLEDGMENTS
cific inputs and outputs. But typically only Web service This work was partially funded by FONDEF (grant D08i1155),
APIs have this data. Mostly because of their complexity UTFSM DGIP (grant DGIP 241167 and PIIC) and CCTVal
and lack of standards, general APIs do not have interface in- (FB/22HA/10).
formation of each operation. Then, this approach performs
well over Web services but not over general Web APIs. Even 9. REFERENCES
when the results are encouraging they actually simulate the [1] C. Carpineto and G. Romano. Exploiting the potential
data of ProgrammableWeb to conduct the experiments. of concept lattices for information retrieval with credo.
In [13], authors proposed ServiceRank to differentiate ser- J. UCS, pages 985–1013, 2004.
vices from a set of functional-equivalent services based on [2] H. Elmeleegy, A. Ivan, R. Akkiraju, and R. Goodwin.
their quality and social aspects. The problem is that it needs Mashup advisor: A recommendation tool for mashup
to access data that maybe providers are not willing to give, development. In Web Services, 2008. ICWS ’08. IEEE
such as the response time and availability measurements. International Conference on, pages 337 –344, sept.
Also, because providers publish their own measurements, 2008.
this process could be not completely reliable. [3] K. Goarany, G. Kulczycki, and M. B. Blake. Mining
In [4] authors proposed MatchUp, a tool that supports social tags to predict mashup patterns. In Proceedings
mashup creators to locate components to mash up based on of the 2nd international workshop on Search and
the current component selection and a complete database mining user-generated contents, SMUC ’10, pages
that describes which components have been used in the dif- 71–78, New York, NY, USA, 2010. ACM.
[4] O. Greenshpan, T. Milo, and N. Polyzotis.
Autocompletion for mashups. Proc. VLDB Endow.,
2:538–549, August 2009.
[5] C. Lindig. Fast concept analysis. In Working with
Conceptual Structures Contributions to ICCS 2000,
pages 152–161. Shaker Verlag, 2000.
[6] G. R. G. Michael Weiss. Modeling the mashup
ecosystem: Structure and growth. In R & D
Management, pages 40–49, 2010.
[7] M. F. Porter. An algorithm for suffix stripping.
Program, 14(3):130–137, 1980.
[8] A. Ranabahu, M. Nagarajan, A. P. Sheth, and
K. Verma. A faceted classification based approach to
search and rank web apis. In Proceedings of the 2008
IEEE International Conference on Web Services,
pages 177–184, Washington, DC, USA, 2008. IEEE
Computer Society.
[9] C. Roth and P. Bourgine. Lattice-based dynamic and
overlapping taxonomies: The case of epistemic
communities. Scientometrics, 69(2):429–447, 2006.
[10] J. W. Shuli Yu. Innovation in the programmable web:
Characterizing the mashup ecosystem. In ICSOC
2008, LNCS 5472, pages 136–147. Springer-Verlag,
2009.
[11] R. Torres, B. Tapia, and H. Astudillo. Improving web
api discovery by leveraging social information. to
appear in the proceedings of the 9th IEEE
International Conference on Web Services, 2011.
[12] M. Weiss and S. Sari. Evolution of the mashup
ecosystem by copying. In Proceedings of the 3rd and
4th International Workshop on Web APIs and
Services Mashups, Mashups ’09/’10, pages 11:1–11:7,
New York, NY, USA, 2010. ACM.
[13] Q. Wu, A. Iyengar, R. Subramanian, I. Rouvellou,
I. Silva-Lepe, and T. Mikalsen. Combining quality of
service and social information for ranking services. In
Proceedings of the 7th International Joint Conference
on Service-Oriented Computing, ICSOC-ServiceWave
’09, pages 561–575, Berlin, Heidelberg, 2009.
Springer-Verlag.

04 - Simplifying Mashup Component Selection With A Combined Similarity - and Social-Based Technique (Tapia, Torres, Astudillo MASHUPS 2011)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04 - Simplifying Mashup Component Selection With A Combined Similarity - and Social-Based Technique (Tapia, Torres, Astudillo MASHUPS 2011)

Uploaded by

Copyright:

Available Formats

Simplifying mashup component selection with a combined

similarity- and social-based technique

Boris Tapia Romina Torres Hernán Astudillo

• the evaluation of the proposed approach through a case

sented as a graph, where each node is a sub-community of

alities at the same time: “social”, “mapping”, “deadpool” and

3.1 Preprocessing Stage

– Ranked by the similarity technique: Glosk, In-

You might also like