Professional Documents
Culture Documents
Liang 2016
Liang 2016
Liang 2016
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 1
Abstract—With the explosive growth of services, including Web services, cloud services, APIs and mashups, discovering the
appropriate services for consumers is becoming an imperative issue. The traditional service discovery approaches mainly face two
challenges: 1) the single source of description documents limits the effectiveness of discovery due to the insufficiency of semantic
information; 2) more factors should be considered with the generally increasing functional and nonfunctional requirements of
consumers. In this paper, we propose a novel framework, called SMS, for effectively discovering the appropriate services by
incorporating social media information. Specifically, we present different methods to measure four social factors (semantic similarity,
popularity, activity, decay factor) collected from Twitter. Latent Semantic Indexing (LSI) model is applied to mine semantic information of
services from meta-data of Twitter Lists that contains them. In addition, we assume the target query-service matching function as a
linear combination of multiple social factors and design a weight learning algorithm to learn an optimal combination of the measured
social factors. Comprehensive experiments based on a real-world dataset crawled from Twitter demonstrate the effectiveness of the
proposed framework SMS, through some compared approaches.
Index Terms—Service discovery, social media, weight learning, social factors combination.
1 I NTRODUCTION
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 2
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 3
103 106
105
Number of Author
Number of API
102 104
103
101 102
101
100 0 100 0
10 101 102 103 104 105 10 101 102 103 104
Number of Lists Number of Lists
(a) (b)
12 12
10 10
8 8
Number of Lists
Number of Lists
6 6
4 4
2 2
0 0
25 0 5 10 15 20 22 0 2 4 6 8 10 12 14 16
Number of Followers Number of Tweets
(c) (d)
lists are in a wide range, which indicates the diversity of list 3 M ETHODOLOGY
semantics.
3.1 Problem Formulation
In this section, we formally formulate the problem of social
media based service search.
In addition, we show some statistics about the social Definition 1. Query-Service Matching Modeling: Given a
information collected from Twitter. Fig.2 (c) shows the cor- collection of candidate services S and a set of queries Q, the
relation between the number of lists by which an API is objective of query-service matching modeling problem is learning
included and the number of followers that an API has. Gen- a function f : Q × S → R+ , such that f (qi , sj ) measures the
erally, an API considered to be attractive to many consumers matching degree between query qi and service sj .
tends to be included in many lists since creating a list can
In our problem, a service is considered to be matched
subscribe status of its members, just like following an API.
with a given query if it satisfies the requirements of con-
Fig.2 (c) shows the strong correlation between the number of
sumers from multiple aspects. Specifically, we take func-
followers and the number of lists. In Fig.2 (d), we show the
tional similarity between the service and given query as
correction between the number of lists an API is included
a prior consideration. Based on this, we assume that con-
and the number of tweets that an API posts. Different from
sumers prefer services that are more popular and more
the strong relativity showed in Fig.2(c), there only exists
active, since popularity and activity respectively indicate the
weak correlation between list number and tweet number.
practical applicability and availability of services.
It means that a popular API might not have much dynamic
A crucial element in Definition 1 is a modeled service
information, indicating that the API might not be practical
sj ∈ S , which is defined as follows.
or even has been outdated. From the above, the number of
an API’s followers and the number of tweets posted by an Definition 2. Modeled Service: A service sj ∈ S is struc-
API could be considered as indicators of API popularity and tured as a k -dimensional tuple sj = [mj1 , mj2 , ..., mjk ], where
activity, respectively. mjl (1 ≤ l ≤ k) is a social feature of service sj .
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 4
In this paper, we intend to exploit multiple social fea- Thus, lists offer semantic annotations of the APIs included
tures to facilitate service discovery. Specifically, we use by them. And the semantic feature is generated by arbitrary
four social features of services extracted from Twitter: list users, which means it reflects the collective knowledge
content, follower number, tweets number, and list number. of crowds. The modeling of the first social factor mainly
These features in general reflect social-level characteristics considers extracting the semantics contained in the crowd-
of services, including semantic similarity (functional), pop- sourced lists.
ularity, activity, and decay factor (nonfunctional).
3.3.2 Data Pre-processing
3.2 Architecture of SMS Due to the arbitrariness of list meta-data (names and de-
Fig.3 illustrates the architecture of the social media based scription), a pre-processing of lists is needed before applying
service discovery approach SMS. The social media informa- LSI model. The detailed strategy of extracting features from
tion for service retrieval is crawled from Twitter after being list meta-data consists of the following steps:
matched with services crawled from ProgrammableWeb. 1. Tokenization: As list names are limited to 25 charac-
Generally, the whole process of SMS could be divided into ters, users often combine words using CamelCase (e.g.,
four parts: PlayStation). It is essential to separate these into individ-
ual words by tokenization algorithm.
• The first part is measuring the first social factor,
2. Stop Words Removal: Words that are not meaningful
service semantics mined from Twitter lists. This part
for presenting APIs, such as a, the, about, etc, should be
consists of data pre-processing, semantic similarity
removed. In addition to the common stop words, a set
calculation by utilizing LSI model based on list meta-
of domain-specific stop words also should be filtered out
data, including list names and descriptions.
(e.g., Twitter, Google, List).
• The second part is building functions to measure
3. Nouns & Adjectives Identification: It has been analyzed
the rest three nonfunctional social factors including
in previous work [15] that nouns and adjectives in list
popularity, activity and decay factor.
names and descriptions are particularly useful for seman-
• All the social factors modeled in the first two parts
tics extraction. Thus we identify nouns and adjectives
intend to be used for a proposed weight learning
using a tagger tool.
algorithm in the third part. The goal of the weight
4. Stemming: Extracting stem words is another important
learning algorithm is to find an optimal combination
process. The stem words can be obtained by removing
strategy for social media information.
the commoner morphological and inflectional endings
• In the final part, for a coming query, the new values
from words. For example, travel, traveler, traveling will be
of modeled four social factors could be integrated
replaced with the same stem travel.
based on the learned optimal weight assignment, to
5. Low Frequency Words Filtration: As the definition of
generate a list of ranked results.
list names and descriptions is arbitrary, some words
with low frequency are regarded as the impurity in the
3.3 Lists based Semantic Similarity presentation of APIs and filtered out.
In this section, we model the functional social factor of
semantic similarity based on list information of services. We 3.3.3 Latent Semantic Indexing
first introduce the content and form of Twitter Lists. Then the Generally, information is retrieved by literally matching
preprocessing of semantic similarity modeling is described. words in documents with those of the given query. How-
Finally we show the LSI based approach for text similarity ever, lexical matching may be inaccurate since some words
computation. share the same concept (synonymy) and most words have
multiple meanings (polysemy). Moreover, for list meta-
3.3.1 Leverage Twitter Lists data, the drawbacks of literal word matching become more
Lists is introduced for users to manage their followings and obvious due to personal style and individual differences in
the tweets they post. Users can group a set of accounts word usage. Thus a better method should permit users to
whom they think have the similar topics by creating a list. search information based on conceptual topics or semantics
A list includes a list name (limited to 25 characters) and a of documents.
list description that approximately shows the content of the Latent Semantic Indexing (LSI) [16] is a model that maps
list. queries and documents into a space with latent semantic di-
Table 1 shows illustrative examples of lists containing mensions. In a latent semantic space, high cosine similarity
APIs, selected from our dataset (to be detailed in Section of a query and a document of which words are conceptually
4). As the examples present, Flickr API is included in lists similar can be obtained even if they do not share any words.
with the name “Photography & design”, “Photography”, LSI method consists of the following four main steps, of
“Photography websites” and corresponding descriptions. which the first two steps are also applied in vector space
LinkedIn and lastfm are in the similar situations. It can models and the step of dimension reduction is the critical
be observed that the list names and descriptions provide part.
valuable semantic cues to the topics of the APIs in the lists. A large collection of text is usually represented as a
For instance, according the list meta-data, we can associate term-document (t × d) matrix X , with each position xij
Flickr API with photography or photo topic, LinkedIn API corresponding to the frequency with which a term (a row
with social or profession topic, lastfm with music topic. i) occurs in a document (a column j ). Note that the order
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 5
Final Weight
SMS Popularity
Assignment
query
Social
Services Factors Activity Weight Learning
Algorithm
Services
matching Decay with
Lists number Ranked
crawl
Results
List description Latent Semantic Semantic
Information Indexing Similarity
name
TABLE 1
Example of APIs, the name and description of lists containing them
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 6
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 7
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 8
TABLE 2 where ri indicates the relevance of the ith ranked result, and
The 52 Sample Queries for Evaluation
1
wDCG (i) =
log2 (i + 1)
Category Sample queries
Music audio, band, lyrics, singer, sound, music 4.2.3 Average Rank@k
Travel hotel, tourism, flight, holiday, voyage, travel In our evaluation, Average Rank at cutoff k (AR@k ) is
Financial stock, investment, price, finance, economics used for measuring popularity of search results. AR@k is
eCommerce shopping, eCommerce calculated as:
Mapping GIS, geography, GPS, mapping, traffic k
1X
Sports NFL, fitness, sport, running, athlete AR@k(r) = ranki , (10)
k i=1
Photos photo, image, picture, camera
Government policy, congress, department, government, law where ranki denotes the ranking score of the ith ranked
Game player, game result, and the value of ranki is obtained from the provider
Education education, training, student, school rank of each API. A lower AR@k indicates a higher rank.
Enterprise sales, hr, leader, enterprise
Social social network, media, social 4.3 Experimental Setup
4.3.1 Training Triplets Generation
For the evaluation of the proposed weight learning algo-
All the experiments in this paper are implemented with rithm, we need to create a collection of triplets for training.
Python 2.7, conducted on a MacBook Pro with an 2.2 GHz First we randomly select a part of the predefined 52 queries
Intel Core i7 CPU and 16 GB 1600 MHz DDR3 RAM, running mentioned in evaluation methodology as training queries.
OS X Yosemite. For each selected query qi , we define a matchsetqi including
APIs whose categories are the same as that of qi . Similarly, a
4.2 Evaluation Methodology unmatchsetqi is defined for APIs which belong to categories
different with qi . Next, we randomly sample an API service
For evaluation, a method of binary judgment should be −
s+
qi from matchsetqi . Then an API service sqi is randomly
applied to identify whether the returned API is relevant
chosen from set unmatchsetqi . Do the same thing to all
to the given query. The queries used for evaluation could
the training queries. Through such process, a collection of
be chosen from a given set of 52 sample queries that are
training triplets Z would be generated for combination
extracted from topics of the 12 categories shown in Table 2.
weights learning in our experiment.
The idea of binary judgment in the evaluation is: a returned
result for a given query can be judged as relevant once 4.3.2 Compared Methods
its category is identical with the category of the query.
We compare the following methods in our experiments:
Besides binary judgment, we adopt provider rank which
represents the ranking of API provider’s site to estimate the • PW: Descriptions of APIs in ProgrammableWeb are
influence of popularity mined from social media, following used for semantic matching by LSI model. We set this
the intuition that an API generated by a provider whose site method as baseline.
is ranked higher would be regarded as the relatively more • List Only: We leverage Twitter list information of
popular one. We use the following three metrics to evaluate APIs to compute the semantic similarity using LSI
the proposed SMS. model for ranking APIs.
• +Pop: We integrate semantic similarity mined from
4.2.1 Precision@k list information of APIs with popularity factor re-
Precision measures the proportion of returned documents ferred from APIs’ follower number.
that are relevant. Precision at cutoff k (P@k ) is calculated as: • +Act: We integrate semantic similarity mined from
list information of APIs with activity factor reflected
k
1X by APIs’ tweet number.
P@k(r) = ri , (8)
k i=1 • +Decay: We integrate semantic similarity mined
from list information of APIs with decay factor gen-
where ri indicates the relevance of the ith ranked result erated from APIs’ list number.
scored as either 0 (not relevant) or 1 (relevant). • Uniform: The semantic similarity and three nonfunc-
tional social factors are uniformly combined with the
4.2.2 nDCG@k same weight.
Normalized discounted cumulative gain (nDCG) is an ex- • SMS: We use the proposed weight learning algo-
plicitly rank-weighted metric. For a ranked list retrieved for rithm for integrating the modeled four social factors.
the user, the further down the list a result occurs, the more Specifically, we run SMS with β (0) = 10−4 , λ = 10−4
its value to the user is discounted. If binary relevance is and T = 20000.
determined, with R relevant results for a query, the formula
Note that the way of integration in +Pop, +Act and
of nDCG@k is:
Pk +Decay is searching and ranking APIs for query q based
i=1 ri · wDCG (i) on SF k (si ) · SF 1 (q, si )(2 ≤ k ≤ 4), where SF k (si ) is the
nDCG@k(r) = Pmin{k,R} , (9)
wDCG (i) k -th nonfunctional social factors of API si .
i=1
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 9
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 10
PW PW PW
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
P@5 P@10 P@15
(a) (b) (c)
PW PW PW
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
nDCG@5 nDCG@10 nDCG@15
(d) (e) (f)
PW PW PW
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
AR@5 AR@10 AR@15
(g) (h) (i)
Fig. 5. Comparison of API Search Methods in terms of P@k, nDCG@k, and AR@k (the bar in red indicates method with the best performance)
hardly any popularity information. Both of WL+P (+D) and which is inherently hard to find relevant APIs increases.
+Pop (+Decay) outperform List Only. However, +Pop and
+Decay performs better than WL+P and WL+D, which is
quite different with P@10 and nDCG@10 based results. It 5 C ASE S TUDY
probably caused by the weakening of popularity factor as This section illustrates retrieval cases for two queries: athlete
the weight training algorithm may assign larger weight for and school, in order to better understand the distinction
semantic information. of the baseline method PW and the proposed SMS that
leverages multiple social media information. For the two
From Fig.6 (a)-(c), according to the blank spaces between
queries, Table 4 reveals the comparison between the top 5
broken lines of List Only and WL+P (+A, +D), it can be eas-
results respectively generated by PW and SMS, along with
ily found that the popularity factor makes more significant
the description, category and provider rank.
influence to search result in terms of P@10 compared to the
As showed in Table 4, the top 5 APIs returned for
other social factors. The same conclusions could be drawn
query athlete by PW method come from various categories,
based on nDCG@10 and AR@10 by the rest figures.
of which only the first API called Athlinks matches with
There exist some apparent drops of performance on the topic of athlete. Since PW method extracts description
P@10 and nDCG@10 with the increase of percentage of train- information offered by API providers, the semantics can be
ing queries , especially when the training queries changes discovered is limited. In contrast, 80% of the top 5 APIs
from 60% to 70%. The possible reason is that the number of returned by SMS are consistent with the category of athlete
queries for testing are reduced and the proportion of queries and the extracts from description, such as “running events”,
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 11
P@10
P@10
0.60 0.60 0.60
nDCG@10
nDCG@10
0.75 0.75 0.75
AR@10
AR@10
0.04 0.04 0.04
“learn from route” and “fitness”, show the reasonableness of work in this paper. In this section, we summarize them into
results. It is lead by the collective knowledge of crowds several categories based on the methods they employ.
inferred from Twitter lists and its optimal integration with
nonfunctional social factors. For query school, the search Semantic-based approaches: This kind of methods gener-
result is similar with the case of query athlete. All of the top ally consider the semantic relevance of services. Wu et al. [1]
5 APIs of SMS are related to the given query while only one [17] proposed a clustering approach utilizing both semantics
of top 5 APIs returned by PW method belongs to Education mined from WSDL documents and tags to improve service
category. search. Lee et al. [2] showed how to syntactically define and
Furthermore, the provider rank in SMS is overall higher semantically describe the characteristics of APIs and how to
than that in PW method, which indicates that the incorpo- use these descriptions to easily search and composite APIs.
ration of social factors improves the popularity of returned Chan et al. [18] proposed a practical approach to help users
APIs. find usages of APIs given only simple text phrases, starting
with a greedy subgraph search algorithm which returned a
connected subgraph containing nodes with high semantic
6 R ELATED W ORK similarity to the query phrases. In [19], Dojchinovski et al.
presented a semantics based model of a Web API directory
Research in the fields of service discovery has been widely graph that captured relationships among APIs, mashups,
published in recent years, specially including service rec- developers, and categories, based on which, a novel API
ommendation, ranking and searching, which is the main selection method were proposed to allow users to get more
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 12
TABLE 4
Top 5 results by PW and SMS for two queries, along with API description, category and provider rank
PW SMS
API Name Extracts from Description Category Rank API Name Extracts from Description Category Rank
Query: athlete
running events, tennis
results of endurance races,
Athlinks Sports 76106 Active.com tournaments high school Sports 4690
athlete profiles
sports ranking data
Online Courier Australian postage, courier training and nutrition
Shipping 7928056 TrainingPeaks Sports 28401
Quotes and shipping rate software, athletes and coaches
validation of sample Plan stride, learn from route,
TrueSample Social 3427166 MapMyRun Sports 12113
population, survey data share your journey
online health network,
search and text mining,
CoPub Science 1111535 HealthTap medical advice, answers to Medical 6372
MedLine database
health questions
integrate survey functionality, exercise equipment, fitness,
Qualtrics Q&A 1447 Life Fitness Sports 114977
questions to ask, response gyms
Query: school
school data, academic
live music and concert
JamBase Music 23514 Education.com performance, student Education 6916
information
demographics
A State of radio show, plays trance and find nearby U.S. schools, see
Music 4421509 GreatSchools Education 10608
Trance progressive rock music information about school
reference database, artists, classroom and education
SecondHandSongs Music 282017 Schoology Education 5410
songs management
school data, academic
Inside Higher online news, job listing
Education.com performance, student Education 6916 Education 17161
Ed provider for higher education
demographics
event organizers, attendees, social network for educators
SpeakerRate Events 1125256 Edmodo Education 2957
speakers and students
control over their preferences and recommended APIs while enhanced QoS-based model in which a discovery agent
the information about their links with other objects and facilitated QoS-based service discovery using the reputation
preferences can be exploited. scores in a service matching, ranking and selection algo-
CF-based approaches: Approaches in this category em- rithm. Kritikos et al. [26] analyzed all necessary require-
ploy collaborative filtering technique which has been widely ments for an accurate, effective, and user-assisting QoS-
adopted in recommender systems, for API or mashup min- based service matchmaking and selection process.
ing. Zhong et al. [20] proposed a time-aware API recom- Social Network-based approaches: Torres et al. [7] con-
mendation approach for mashup creation with the combina- structed collaboration network of APIs and proposed a API
tion of service evolution, collaborative filtering and content discovery approach based on social information. In [8],
matching. A user-group item-based collaborative filtering Cao et al. presented an approach to recommend mashup
method was proposed for personalized Open API recom- services, which considered users’ interests mined from their
mendation in clouds by [21]. Zheng et al. [22] provided a mashup usage history and the social network based on
QoS value prediction approach for service by combining the relationships of mashup services, Web APIs and tags. An it-
traditional user-based and item-based collaborative filtering erative approach to find APIs for building mashups was pre-
methods. Users can discover suitable services according sented in [27], with the combination of semantic and social
QoS information from similar users without service invo- information obtained from the built collaborative social net-
cations. [23] et al. proposed a hybrid collaborative filter- work of APIs. Xu [28] et al. proposed a social-aware service
ing for service recommendation and an inverse consumer recommendation approach that utilized multi-dimensional
frequency based user collaborative filtering for consumer social relationships among potential users, topics, mashups,
recommendation. Sun et al. [24] designed a new similarity and services.
measure for similarity computation between services and Others: We also list some work about service discovery
proposed a normal recovery collaborative filtering approach not involved by those types above. Different with these
to recommend services for consumers. service (API and mashup) recommendation approach that
QoS-based approaches: QoS for Web services refers to implemented in the same library, Zheng et al. [29] tried to
various nonfunctional features such as availability, response recommend related APIs of different libraries through min-
time, throughout, and reliability. The goal of QoS-based ing search results of Web search engines. Bianchini et al. [30]
approach is discovering services that meet the nonfunctional provided a multi-perspective based API search framework
requirements of users. The service selection model proposed for enterprise mashup design, in which a new perspective
by Yu et al. [5] defined multiple QoS criteria and took global of the web designers’ experience is used together with
constraints into consideration. Zhang et al. [6] presented other Web API search techniques, relying on classification
WSExpress considering QoS characteristics of Web services features, and technical features, such as the API protocols
for service discovery. In [25], Xu et al. proposed reputation- and data formats. In [31], Gomadam et al. proposed a
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 13
method for searching and ranking Web APIs that adopts ACKNOWLEDGMENTS
document classification and faceted search, and the serviut
This research was partially supported by the Natural Sci-
score is used to rank APIs based on their utilization and
ence Foundation of China under grant of No. 61379119,
popularity.
National Science and Technology Supporting Program of
Most of these service discovery approaches exploit in- China under grant of No. 2015BAH18F02, the Fundamental
formation that services originally contain or offered by ser- Research Funds for the Central Universities under grant of
vice providers: functional similarity (semantic information), No. ZH2016007.
nonfunctional feature (QoS), or social network generated
by the collaboration or relationship among services, their
compositions and consumers. With the generally increasing R EFERENCES
functional and nonfunctional requirements of consumers,
[1] J. Wu, L. Chen, Z. Zheng, M. R. Lyu, and Z. Wu, “Clustering web
the performance of service discovery might be limited by services to facilitate service discovery,” Knowledge and information
the lack of richness and diversity in existing available infor- systems, vol. 38, no. 1, pp. 207–229, 2014.
mation. Thus, we try to introduce social media information [2] Y.-J. Lee and J.-S. Kim, “Automatic web api composition for
to facilitate service discovery process. In [14], we proposed semantic data mashups,” in Proceedings of the 4th IEEE Interna-
tional Conference on Computational Intelligence and Communication
a crowdsourcing based approach that leverages Twitter lists Networks (CICN). IEEE, 2012, pp. 953–957.
to infer semantic information for service search. Based on [3] S.-Y. Lin, C.-H. Lai, C.-H. Wu, and C.-C. Lo, “A trustworthy qos-
the previous work, this paper incorporates more factors of based collaborative filtering approach for web service discovery,”
social media and presents a weight learning algorithm to Journal of Systems and Software, vol. 93, pp. 217–228, 2014.
[4] N. N. Chan, W. Gaaloul, and S. Tata, “A recommender system
get an optimal combination of all the social factors. based on historical usage data for web service discovery,” Service
Oriented Computing and Applications, vol. 6, no. 1, pp. 51–63, 2012.
[5] T. Yu, Y. Zhang, and K.-J. Lin, “Efficient algorithms for web ser-
7 C ONCLUSION vices selection with end-to-end qos constraints,” ACM Transactions
on the Web (TWEB), vol. 1, no. 1, p. 6, 2007.
With the explosive growth of services, including Web ser- [6] Y. Zhang, Z. Zheng, and M. R. Lyu, “Wsexpress: A qos-aware
vices, cloud services, APIs and mashups, discovering the ap- search engine for web services,” in Proceedings of the 17th IEEE
propriate services for consumers is becoming an imperative International Conference on Web Services (ICWS). IEEE, 2010, pp.
91–98.
issue. It is feasible to incorporate social media information to [7] R. Torres, B. Tapia, and H. Astudillo, “Improving web api discov-
facilitate the quality of service discovery, since the presence ery by leveraging social information,” in Proceedings of the 18th
of services in social network is rapidly increasing and direct IEEE International Conference on Web Services (ICWS). IEEE, 2011,
feedback information of crowds from social network is more pp. 744–745.
[8] B. Cao, J. Liu, M. Tang, Z. Zheng, and G. Wang, “Mashup service
valuable for the research of service discovery. recommendation based on user interest and social network,” in
In this paper, we propose a novel framework for service Proceedings of the 20th IEEE International Conference on Web Services
discovery called SMS, which exploits social media informa- (ICWS). IEEE, 2013, pp. 99–106.
[9] L. Chen, Q. Yu, S. Y. Philip, and J. Wu, “Ws-hfs: A heterogeneous
tion collected from Twitter. We propose different methods feature selection framework for web services mining,” in 2015
to measure four social factors. For the functional social IEEE International Conference on Web Services (ICWS). IEEE, 2015,
factor, LSI model is applied to mine semantic information of pp. 193–200.
services from meta-data of lists that contains them. For the [10] G. Liu, A. Liu, Y. Wang, and L. Li, “An efficient multiple trust
paths finding algorithm for trustworthy service provider selection
rest three nonfunctional social factors, we use normalization in real-time online social network environments,” in 2014 IEEE
method and decay function over list number to measure International Conference on Web Services (ICWS). IEEE, 2014, pp.
them. We present a weight learning algorithm to learn an 121–128.
[11] S. Ghosh, N. Sharma, F. Benevenuto, N. Ganguly, and K. Gum-
optimal combination of the measured social factors. The
madi, “Cognos: crowdsourcing search for topic experts in mi-
target query-service matching function is assumed as a croblogs,” in Proceedings of the 35th international ACM SIGIR con-
linear combination of the multiple social factors. Then we ference on Research and development in information retrieval. ACM,
learn the optimal weight assignment based on the generated 2012, pp. 575–590.
[12] P. Lamere, “Social tagging and music information retrieval,” Jour-
collection of training triplets. Finally we use the learned nal of New Music Research, vol. 37, no. 2, pp. 101–114, 2008.
weights to retrieve services for a coming query. [13] N. Kallen, “Twitter blog: Soon to launch: Lists,” 2009.
In the experiment part, we report the evaluation results [14] T. Liang, L. Chen, H. Ying, Z. Zheng, and J. Wu, “Crowdsourcing
on a real-world dataset crawled from Twitter. For evalua- based api search via leveraging twitter lists information,” in Pro-
ceedings of IEEE International Conference on Data Mining Workshop
tion, 52 sample queries of 12 categories are selected. We (ICDMW). IEEE, 2015, pp. 1540–1547.
design a service category based binary judgment to measure [15] R. Pochampally and V. Varma, “User context as a source of topic
the accuracy of our proposed SMS. In addition, we adopt retrieval in twitter,” in Workshop on Enriching Information Retrieval
(with ACM SIGIR), 2011, pp. 1–3.
an external data source about rank of service providers [16] B. Rosario, “Latent semantic indexing: An overview,” Techn. rep.
to evaluate the influence of popularity mined from social INFOSYS, vol. 240, 2000.
media. The evaluation results respectively demonstrate the [17] L. Chen, L. Hu, Z. Zheng, J. Wu, J. Yin, Y. Li, and S. Deng,
effectiveness of social factors and the proposed weight “Wtcluster: Utilizing tags for web services clustering,” in Service-
Oriented Computing. Springer, 2011, pp. 204–218.
learning algorithm from different aspects. [18] W.-K. Chan, H. Cheng, and D. Lo, “Searching connected api sub-
In the future, we plan to (i) explore more factors of ser- graph via text phrases,” in Proceedings of the ACM SIGSOFT 20th
vices from social media to further improve the performance International Symposium on the Foundations of Software Engineering.
ACM, 2012, p. 10.
of service discovery; (ii) apply our framework to address
[19] M. Dojchinovski, J. Kuchar, T. Vitvar, and M. Zaremba, “Person-
other challenging issues in service discovery, such as similar alised graph-based selection of web apis,” in The Semantic Web–
service retrieval, service recommendation for composition. ISWC 2012. Springer, 2012, pp. 34–48.
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSC.2016.2631521, IEEE
Transactions on Services Computing
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX XXXX 14
[20] Y. Zhong, Y. Fan, K. Huang, W. Tan, and J. Zhang, “Time- Jian Wu received his B.S. and Ph.D. Degrees
aware service recommendation for mashup creation in an evolving in computer science from Zhejiang University,
service ecosystem,” in Proceedings of the 21st IEEE International Hangzhou, China, in 1998 and 2004, respec-
Conference on Web Services (ICWS). IEEE, 2014, pp. 25–32. tively. He is currently a full professor at the Col-
[21] H. Sun, Z. Zheng, J. Chen, W. Pan, C. Liu, and W. Ma, “Personal- lege of Computer Science, Zhejiang University,
ized open api recommendation in clouds via item-based collab- and visiting professor at University of Illinois at
orative filtering,” in Proceedings of the Fourth IEEE International Urbana-Champaign. His research interests in-
Conference on Utility and Cloud Computing (UCC). IEEE, 2011, pp. clude service computing and data mining. He is
237–244. the recipient of the second grade prize of the Na-
[22] Z. Zheng, H. Ma, M. R. Lyu, and I. King, “Qos-aware web service tional Science Progress Award. He is currently
recommendation by collaborative filtering,” IEEE Transactions on leading some research projects supported by
Services Computing, vol. 4, no. 2, pp. 140–152, 2011. National Natural Science Foundation of China and National High-tech
[23] J. Cao, Z. Wu, Y. Wang, and Y. Zhuang, “Hybrid collaborative R&D Program of China (863 Program).
filtering algorithm for bidirectional web service recommendation,”
Knowledge and information systems, vol. 36, no. 3, pp. 607–627, 2013.
[24] H. Sun, Z. Zheng, J. Chen, and M. R. Lyu, “Personalized web ser-
vice recommendation via normal recovery collaborative filtering,”
IEEE Transactions on Services Computing, vol. 6, no. 4, pp. 573–579,
2013.
[25] Z. Xu, P. Martin, W. Powley, and F. Zulkernine, “Reputation-
enhanced qos-based web services discovery,” in Proceedings of the
14th IEEE International Conference on Web Services (ICWS). IEEE,
2007, pp. 249–256.
[26] K. Kritikos and D. Plexousakis, “Requirements for qos-based web
service description and discovery,” IEEE Transactions on Services
Computing, vol. 2, no. 4, pp. 320–337, 2009. Guandong Xu received the Ph.D. degree in
[27] B. Tapia, R. Torres, and H. Astudillo, “Simplifying mashup com- Computer Science from Victoria University, Aus-
ponent selection with a combined similarity-and social-based tech- tralia. He is currently a Senior Lecturer of School
nique,” in Proceedings of the 5th International Workshop on Web APIs of Software and the Advanced Analytics Institute
and Service Mashups. ACM, 2011, p. 8. at University of Technology Sydney. He has au-
[28] W. Xu, J. Cao, L. Hu, J. Wang, and M. Li, “A social-aware service thored three monographs with the Springer and
recommendation approach for mashup creation,” in Proceedings of the CRC Press, and 100+ journal and confer-
the 20th IEEE International Conference on Web Services (ICWS). IEEE, ence papers. His current research interests in-
2013, pp. 107–114. clude data science and data analytics, Web data
[29] W. Zheng, Q. Zhang, and M. Lyu, “Cross-library api recommen- mining, behaviour analytics, recommender sys-
dation using web search engines,” in Proceedings of the 19th ACM tems, predictive analytics, social network anal-
SIGSOFT symposium and the 13th European conference on Foundations ysis. Dr. Xu has served in the Editorial Board or as Guest Editor for
of software engineering. ACM, 2011, pp. 480–483. several international journals. He is the Assistant Editor-in-Chief of the
[30] D. Bianchini, V. De Antonellis, and M. Melchiori, “A multi- World Wide Web Journal.
perspective framework for web api search in enterprise mashup
design,” in Advanced Information Systems Engineering. Springer,
2013, pp. 353–368.
[31] K. Gomadam, A. Ranabahu, M. Nagarajan, A. P. Sheth, and
K. Verma, “A faceted classification based approach to search
and rank web apis,” in Proceedings of the 15th IEEE International
Conference on Web Services (ICWS). IEEE, 2008, pp. 177–184.
Tingting Liang received the B.S degree in the Zhaohui Wu received the Ph.D. degree in
school of science, Jiangnan University, Wuxi, computer science from Zhejiang University,
China, in 2013. She is currently working toward Hangzhou, China, in 1993. From 1991 to 1993,
the Ph.D degree in the College of Computer he was a joint Ph.D. student in the area of knowl-
Science, Zhejiang University. Her publications edge representation and expert system with the
have appeared in some well-known conference German Research Center for Artificial Intelli-
proceedings, e.g, ICSOC, ICWS, ICDM, etc. Her gence, Kaiserslautern, Germany. He is currently
research interests include service computing, a Professor at the College of Computer Science,
data mining and recommender system. the Director of the Institute of Computer Sys-
tem and Architecture, Zhejiang University, and
the leader of Modern Service Industry expert
group in the Ministry of Science and Technology, China. His research
interests include intelligent transportation systems, distributed artificial
intelligence, semantic grid, and ubiquitous embedded systems. He is
Liang Chen received the Ph.D. and B.S. degree the author of four books and more than 100 referred papers. Dr. Wu is a
in computer science from Zhejiang University, Standing Council Member of China Computer Federation (CCF), Beijing.
Hangzhou, China in 2009 and 2015, respec- Since June 2005, he has been the Vice Chairman of the CCF Pervasive
tively. He is currently a Postdoc Research Fel- Computing Committee. He is also on the editorial boards of several
low in Advanced Analytical Institute, University journals and has served as a Program Chair for various international
of Sydney Technology, Australia. Dr. Chen has conferences.
served as the workshop co-chair in many confer-
ences, such as CIKM, BigData, PAKDD, etc. His
publications have appeared in many well-known
conference proceedings and international jour-
nals, e.g, WWW, ICSE, ICDM, CIKM, ICSOC,
ICWS, TSC, TSMC, WWWJ, etc. His research interests include service
computing, recommender system, and data mining.
1939-1374 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.