You are on page 1of 5

2009 International Conference on Computational Intelligence and Security

Web Service Community Discovery Based on Spectrum Clustering

Xizhe Zhang, Ying Yin, Mingwei Zhang, Bin Zhang


College of Information Science and Engineering
Northeastern University
Shenyang, China
zhangxizhe@ise.neu.edu.cn

Abstract—More and more web services are emerging in the effectively, as well as guarantee the usability and
Internet with development of service computing over the compatibility for services interaction.
past decade. The web service in SOA system naturally Mining web services can help the user to better
forms into some service community during execution understand service behavior and use services properly.
process. Mining and analysis web service community will Dong [4] describes an algorithm for supporting
help design SOA system and support web service similarity search of web services by clustering names of
application effectively. This article addresses the problem parameters of web service operations. Zheng [5] proposes a
of how to discovering web services community formed by
service mining framework for exploring interesting
closely interactive web services. We propose a novel
compositions of existing Web services. A few researches [6]
approach which construct web service execution network
from logs and clustering it using spectrum clustering.
perform mining web service logs, but only directly apply
Generally, the web services belong to same cluster have process mining and sequence mining algorithm and no
strong relative to same task object and we call it web alteration aiming at service characteristic. 
service community. The approach has been implemented in The domains related to web service log mining are web
an experience system for web services dynamic composition usage mining [7], process mining [8]. Web usage mining is
and mining. to analyze user behavior by mining visit logs, and its
purpose is to provide customized service or improve the site
Keywords-web services; community mining; spectrum structure. Process mining is to find some heuristic rules by
clustering mining the sequence relations of the events in log, and then
summarize process model from these rules. But the data
I. INTRODUCTION disposed by process mining are static, and can hardly meet
With the development of web services technology, the requirement of service dynamic variability.
Service Oriented Computing (in short SOC) has become the Mining community of web service will facing the
mainstream of distribution computing. Researchers have following problem:
made a great deal of research on service composition, 1. How to collect web service usage log effectively? The
service selection, service substitution and other area [1-3]. self-government and distribution of web services cause
Compared with other distribution software development difficulty in log collecting. An effective service log
environment, SOC has obviously more openness and collecting mechanism is necessary for development of
dynamic, which are embodied by dynamic variability of service mining.
network and execution environment and state of web service, 2. The collection of web services is dynamic and the
business requirement from different user (composition logic, service behavior is variable by time. It is necessary to
service quality, etc.) , dynamic variability of service space determine the size of time horizon, so that the mining result
(emergence of new service, evolvement of existing service, is workable.
etc.). In general, the lifecycle for a composition service is 3. How to design the mining algorithm according to
not momentary (maybe a few hours, days or longer), so the dynamic of service space, so that it is suitable to the
web services selected by method, which is only aiming at dynamic evolvement of web services, and guarantee the
selecting period and user requirement, cannot meet the mining result is proper and effective.
requirement of dynamic service computing environment. A basic idea for web service composition mining is that
This paper is focused on the mining problem for web the services related to certain business topic must have some
service community discovery based on web service logs. It direct or indirect connection between them. These
can be defined that how to mining composition pattern from connection represent in the service usage log is that the
service log in dynamic service composition environment. service tends to forming “community”, in which the web
These service communities acquired from service usage log services interact each other with high frequency. Based on
can help the users to understand the service behavior this idea, we can cluster the service collection into several

978-0-7695-3931-7/09 $26.00 © 2009 IEEE 187


DOI 10.1109/CIS.2009.162

Authorized licensed use limited to: Dalhousie University. Downloaded on March 04,2022 at 18:31:31 UTC from IEEE Xplore. Restrictions apply.
communities by their interaction relationship. These web physical basic unit of interaction information and
service communities represent some business topic. information transferring between engine and service is the
This paper proposes a novel web service community minimum logic unit.
discovery algorithm based on spectrum clustering. By
analyzing web service usage log, we construct the web
service interaction network and discovery the closely related
service community by spectral clustering algorithm. These
naturally forming web service communities represent the
corresponding business topic, which can provide guidance
for users on service behavior understanding, service
selection, service substitution, etc.
The remainder of this paper is organized as follow: In
Sec.2, we present the structure and collection mechanism of
web service usage log; In Sec.3, we describe the algorithm
for web service community mining; In Sec.4, we discuss the Figure 1. A composite service log architecture
validity of the proposed approach. Finally, we provide
concluding remark in Sec.5. The service usage log can be defined as message record
II. WEB SERVICE LOG during a certain period of service execution. The service
execution log is denote as L={l1 ,l2, …, ln, …}, among which
As the web logs are recorded by the web server that web li is called as log item. It can be described as l = {PID, ws,
services reside on, which is lack of web service execution time, size, category }, among which:
information, so it cannot meet the requirement of service PID represents the ID of business model which current
mining. We propose a web service log collecting log item belong; it is generated by composite service
architecture, which records service execution information by execution engine.
web service composition execution engine. ws is the web service which current log item executed;
Fig.1 shows a web service composition framework which time is the record time of the log item;
supporting web service logging. The framework logs web size is the message size;
service execution information by service composition Category indicate the role of execution engine in the
execution engine. The execution engine is response for service interaction; its value is {Sender, receiver}, it means
execute business process and exchange message between the execution engine is the message sender or message
the web service and record log simultaneity. receiver.
The log records the interaction information between The sample of web service log is showed in Table.1.
execution engine and web service. SOAP message is the
TABLE I. SAMPLE OF WEB SERVICE LOG
# PID ws time size catagory
1 TravelProcess FlightTicket 02/Jan/2007:13:08:21-0700 1410 Sender
2 TravelProcess FlightTicket 02/Jan/2007:13:09:35-0700 2050 Reciever
3 TravelProcess WS-Pay 02/Jan/2007:13:11:02-0700 8140 Sender
4 Agent4BuyBook001 WS-Order 02/Jan/2007:13:08:21-0700 1020 Sender
5 TravelProcess WS-Pay 02/Jan/2007:13:08:21-0700 3250 Reciever
6 Agent4BuyBook001 WS-Order 02/Jan/2007:13:08:21-0700 2010 Reciever

ensure the result is updated and valid. A proper time horizon


III. MINING WEB SERVICE COMMUNITY is the key point to mining quality.
Time horizon h is the history of a log at the length of h.
A. Pre-processing web services log
When time horizon h and cluster number k are defined,
Web service is highly dynamic in service computing service community discovery is to find k service
environment. Service can be canceled or modified, and new communities in h time horizon of the log. The detailed
service is continuously created. The Internet network and process is as follows. Firstly, figure out the proper time
service execution environment also keep changing. horizon [tc, tc-h] according to the evolvement speed of
Therefore web services and their behavior are changing service log, among which tc is present time. Secondly,
accordingly. How to effectively find the dynamic variability construct corresponding service interaction network
of web services is the key point of web service mining. according to time horizon, and calculate the similarity
The dynamicity of web service is an important feature to between the nodes, as well as clustering the network, and
service mining. Required by continuously changing log, a finally gain the service community.
proper time horizon must be determined before mining to

188

Authorized licensed use limited to: Dalhousie University. Downloaded on March 04,2022 at 18:31:31 UTC from IEEE Xplore. Restrictions apply.
When service is changing in high speed, the time The web service network contains rich information about
horizon shall be shorten accordingly; while be increased service behavior. How to measure the distance between
when service is changing in low speed. So, the size of time nodes is an important factor to service community discovery.
horizon can be determined by service evolvement speed. An obvious way is to define the edge weight according to
For the service space S, let the total log time horizon is interaction times among nodes. But the outcome is not as
[tstart, tend], let the web service set is (op1,op2,…,opn). We good as expected because of dynamic character of log data.
first cut the time axis into several interval T1,T2,…,Tn It is embodied as follows:
according to the natural time unit such as hour, day, week, 1. The importance of the log data is weakening with time,
etc. let e ( Ti ) = ( p1 , ", pn ) be the service indicate vector in more weight should be given to new data, so that the
clustering result can inflect recent service behavior pattern
the time interval Ti, we have:
better.
⎧1 if opi is executed in Ti 2. In the service environment, lifecycle for different
pj = ⎨ (1)
⎩0 otherwise business process is different. If a certain business exists for
The difference Diff (Ti , Ti +1 ) between two adjacent time a long time, it is usually executed for more times, while new
business process is executed less times. The internal
intervals is defined as follow:
interaction density for all the transactions is not even.
e(Ti ) ⋅ e(Ti +1 ) So, we should take time element into consideration when
Diff (Ti , Ti +1 ) = 1 − (2)
e(Ti ) × e(Ti +1 ) designing weight. For an operation interaction opi → op j , the
The difference Diff (Ti , Ti +1 ) represents the variety degree mining time horizon is [tstart, tend], the definition for weight
of the two adjacent time interval. We can use this may be given as follows:
measurement to segment the time axis into several horizons op j .t − tstart
that the service changes relatively slow. wi (opi → op j ) = (3)
The segmentation method is similar to hierarchical tend − tstart
clustering. First we segment the time axis into several The weight of operation node can by computed by
intervals, and then compute the difference for the adjacent combine all the edges weight. Let there are n times
interval; Select the two intervals which have smallest interaction between operation op1, op2, which are e1,e2,...
difference and combine them into one interval, and compute en,the weight of operation op1,op2 is:
the difference between the new interval and its adjacent n

interval; repeat this process until the whole horizon has ∑ w (e )


i i
combined. w(op1 , op2 ) = i =1
(4)
Based on above method, we have the segmentation result max ( ei .time ) − min ( ei .time )
for the web service log. We can select the mining time Equation (4) compute the weight by using interact
horizon which satisfies the difference threshold from frequency of the operation, which make the operation group
segmentation result. The time horizon selected by above that execute less time can be found.
method can reflect the variability degree of web service in For example, let operation op1,op2 interact 200 time in a
the service environment. Selecting the log sequence month, and op3,op4 interact 30 time in three days, the weight
evolvement relatively slow will guarantee the validity of the of single edge is set to 1. The weight computes by execute
community discovery result. time will be:
B. Web Services network clustering w(op1 , op2 ) =200; w(op3 , op4 ) =50;
We can represent the interaction behavior of service web And by frequency, we have:
service as a directed multi-graph, called web service w(op1 , op2 ) =6.66; w(op3 , op4 ) =20。
interaction network, denote as G(S,E). It figures the This guarantee the new operation group which executed
interaction relationship between web services of the service less time but with high frequency will be founded.
environment in a certain time horizon. The web service set We use the spectral clustering algorithm [9] to find
is denoted as S and the edge set E represent the web service operation group. Compared to the traditional algorithms
interaction relationship, the direction of edge represent the such as k-means or single linkage, results obtained by
time sequence in the interaction. spectral clustering very often outperform the traditional
The web service interaction network is directed multi- approaches, spectral clustering is very simple to implement
graph for web service is repeatedly executed commonly. So and reasonably fast (for sparse data sets up to several
we can discover the web service community by the density thousands).
of interaction relationship. The web service community is a Spectral clustering algorithms have many variations. We
group in which the web services have tightly interaction. take Ratio Cut method to clustering operation interaction
From the business view, the web service group often relative graph. The parameter is service usage log L and clusters
to the same business goal. So we can get the web service number k. The similarity matrix W is denoted as follow:
composition model by finding web service community.

189

Authorized licensed use limited to: Dalhousie University. Downloaded on March 04,2022 at 18:31:31 UTC from IEEE Xplore. Restrictions apply.
⎡ w11 w1n ⎤ existing statistics data.
⎢ " ⎥ We give the measurements for service composition
W =⎢ ⎥ (5)
⎢ " ⎥ model, which includes:
⎢ ⎥ 1. Response time. It is an important measure for evaluate
w
⎣ n1 wnn ⎦
the performance of composite service, which can be
Wij = w(op1 , op2 ) is the interaction weight for operation calculated by service execution records in the log. To sum
opi and opj, which can be computed by equation(4). Fig.2 the each service response time in service composition model,
shows the overview of clustering algorithm. we will get the measurement of response time for service
Input:Service usage log L, Clusters number k
composition model.
Output:Clustering result G’
2. Robustness. It is the measurement for success
execution rate of service request. As the service function
Algorithm Spectral Clustering ( L,k )
1. Construction operation interaction graph based on and usability is variable with time and environment, service
service usage log L; failure may take place during execution. The robustness of
2. Compute similarity matrix W of operation interaction service composition model can be defined as follows:
graph G; ( )
R = E fail / E fail + Esuccess (8)
3. Compute Laplacians matrix L=D-W,
n
Efail is the failure time of service request in service
where D = diag ( d1 , d 2 ,", d n ) , di = ∑ wij ; composition model, and Esuccess is the success time of service
j =1 request.
4. Compute the eigenvector v1 , v2 , " , vk corresponding to
the eigenvalue from the first smallest to the kth smallest,
let V = [v1 , v2 ," , vk ] , so V ∈ R n×k . Let V = [ y1 , y2 ," , yn ]T ,
and we get a k-dimension set V ' = { y1 , y2 , ", yn } ;
5. For the data point y1 , y2 ," , yn ,clustering them into
k clusters by K-Means algorithm.

Figure 2. The spectral clustering algorithm for service network

IV. EXPERIMENT Figure 3. Compariation for robustness between mining result and
real composite services
Based on above research, we design a web service
mining system to test the validity of above algorithm. This
system creates web service usage log by simulating service
execution environment. On the basis of these logs, we are
able to evaluate the usability of web service community
discovery algorithm. The composition model construct by
service community can be evaluated by reliability of service
and response time.
First, we create network topology graph by Internet
topology generator [10]. The network delay among nodes is
calculated by the shortest route algorithm. And then the
Figure 4. Compariation for response time between mining result
service structure is built. Each service contains service name,
and real composite services
service type, parameter and other information. Then we
select the actual business process to build web service Fig.3 and Fig.4 provides the comparison of response
composition model. Finally, the simulating execution based time and robustness between mining result and real
on above environment is carried out and the service usage composite services. We can see that the response time and
log is generated. robustness of service composition mined by log is in better
We generate 1500 web service, which belong to 200 level than that of real service composition. This is because
service types, and random distributed in 800 network nodes; that after a certain time of execution, the real service
50 service composition models are created; the total time composition process may be lagging behind the service
horizon for service log is 30 weeks. As the business process dynamic variability, which will result in its quality decrease.
and web service in real service space is highly variable, we
set random existing periods to service composition model V. CONCLUSION
and web service, so that new model and service will The service log contains plenty of information about
continuously enter into, while the old model and service exit.
service usage pattern, the quality and performance of service
The degree of variability of web service can be decided by

190

Authorized licensed use limited to: Dalhousie University. Downloaded on March 04,2022 at 18:31:31 UTC from IEEE Xplore. Restrictions apply.
application may be improved by mining service log. This
paper proposes a web service community discovery
algorithm based on Spectrum Clustering. By analyzing web
service execution log, we construct the web service
interaction network, and find out the closely related service
groups with same usage pattern by spectral clustering
algorithm. These naturally forming service groups represent
the corresponding business process, which can provide
guidance for users on service behavior understanding,
service selection, service substitution, etc.

ACKNOWLEDGMENT
This work was supported by a grant from National
Natural Science Foundation of China (No. 60903009),
Postdoctoral Science Foundation of China (No.
20080430185), National High Technology Research and
Development Program of China (863 Program) (No.
2009AA012122).
REFERENCES
[1] G. Zheng, A. Bouguettaya. A Web Service Mining Framework. IEEE
International Conference on Web Services (ICWS 2007), 1096-1103.
[2] Mohsen J. Asbagh, Hassan Abolhassani. Web service usage
mining: mining for executable sequences. Proceedings of the 7th
Conference on 7th WSEAS International Conference on Applied
Computer Science, Venice, Italy, 266-271.
[3] Robert Gombotz, Schahram Dustdar. On Web Services Workflow
Mining. BPM 2005 Workshops, LNCS 3812, 216–228, 2006.
[4] Xin Dong, Alon Halevy, Jayant Madhavan, Ema Nemes, Jun Zhang.
Similarity Search for Web Services. Proceedings of the 30th VLDB
Conference, Toronto, Canada, 2004, 372-383.
[5] G. Zheng, A. Bouguettaya. A Web Service Mining Framework. IEEE
International Conference on Web Services (ICWS 2007), 1096-1103.
[6] Mohsen J. Asbagh, Hassan Abolhassani. Web service usage
mining: mining for executable sequences. Proceedings of the 7th
Conference on 7th WSEAS International Conference on Applied
Computer Science, Venice, Italy, 266-271.
[7] J. Srivastava, R. Cooley, M. Deshpande, P-T. Tan. Web Usage
Mining: Discovery and Applications of Usage Patterns from Web
Data. ACM SIGKDD Explorations, (1) 2, January 2000.
[8] van der Aalst W., Weijters T., Maruster L. Workflow mining:
discovering process models from event logs. IEEE Transactions on
Knowledge and Data Engineering, 16(9), 2004, 1128 – 1142.
[9] Ulrike Luxburg. A Tutorial on Spectral Clustering. Statistics and
Computing, 17(4), 2007, 395-416.
[10] Medina A, Lakhina A, Matta I, Byers JW. BRITE: An approach to
universal topology generation. In: Proc. of the 9th Int’l Workshop on
Modeling, Analysis, and Simulation of Computer and
Telecommunication Systems (MASCOTS 2001).Cincinnati, 2001.
346−356.

191

Authorized licensed use limited to: Dalhousie University. Downloaded on March 04,2022 at 18:31:31 UTC from IEEE Xplore. Restrictions apply.

You might also like