You are on page 1of 13

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 1

Service Mining: Using Process Mining to


Discover, Check, and Improve Service
Behavior
Wil van der Aalst, Senior Member, IEEE

Abstract—Web services are an emerging technology to implement and integrate business processes within and across
enterprises. Service-orientation can be used to decompose complex systems into loosely coupled software components that
may run remotely. However, the distributed nature of services complicates the design and analysis of service-oriented systems
that support end-to-end business processes. Fortunately, services leave trails in so-called event logs and recent breakthroughs
in process mining research make it possible to discover, analyze, and improve business processes based on such logs.
Recently, the Task Force on Process Mining released the Process Mining Manifesto. This manifesto is supported by 53
organizations and 77 process mining experts contributed to it. The active participation from end-users, tool vendors, consultants,
analysts, and researchers illustrate the growing significance of process mining as a bridge between data mining and business
process modeling.
In this paper, we focus on the opportunities and challenges for service mining, i.e., applying process mining techniques to
services. We discuss the guiding principles and challenges listed in the Process Mining Manifesto and also highlight challenges
specific for service-orientated systems.

Index Terms—Process Mining, Business Process Management, Service Discovery, Conformance Checking

1 I NTRODUCTION logs readily available in today’s information systems


Web services have emerged as an established [10], [11], [12].
paradigm for architecting and implementing busi- Starting point for process mining is an event log.
ness collaborations within and across organizational Each event in such a log refers to an activity (i.e., a
boundaries [1], [2]. In this paradigm, the functionality well-defined step in some process) and is related to
provided by business applications is encapsulated a particular case (i.e., a process instance). The events
within web services, i.e., software components de- belonging to a case are ordered and describe one
scribed at a semantic level, which can be invoked “run” of the process. Event logs may store additional
by application programs or by other services through information about events. In fact, whenever possi-
a stack of Internet standards including HTTP, XML, ble, process mining techniques use supplementary
SOAP, WSDL and UDDI [1], [2], [3], [4], [5]. Once information such as the resource (i.e., person, device,
deployed, web services provided by various organi- or software component) executing or initiating the
zations can be inter-connected in order to implement activity, the timestamp of the event, and other data
business collaborations, leading to composite web attributes (e.g., the size of an order).
services. Typically, three types of process mining can be dis-
In the context of web services, typically all kinds tinguished: (a) process discovery, (b) conformance check-
of events are being recorded. It is possible to record ing, and (c) model enhancement [10]. Discovery tech-
events related to activities inside services or interactions niques learn a model from an event log without using
between services (e.g., messages) [6], [7], [8], [9]. The any additional information. This results in a so-called
autonomous nature of services and the fact that they initial process model. This model can also be made by
are loosely coupled makes it important to monitor and hand. In both situations, conformance checking tech-
analyze their behavior. In this paper, we will refer to niques can be used to compare the observed behavior
this as service mining. (event logs) with the modeled behavior (initial process
Process mining is an enabling technology for service model). This results in diagnostics showing deviations
mining. Process mining aims to discover, monitor and between model and log. After conformance checking,
improve real processes by extracting knowledge from event model and log are aligned and information from the
log may be used to enhance the model. The model
• Wil van der Aalst is with the Department of Mathematics and
may be repaired or extended with other perspectives
Computer Science of Eindhoven University of Technology in The such as the time or resource perspective. For example,
Netherlands and Queensland University of Technology in Brisbane, timestamps in the event log may be used to add
Australia. E-mail: see vdaalst.com
timing information (waiting times and service times)

Digital Object Indentifier 10.1109/TSC.2012.25 1939-1374/12/$31.00 © 2012 IEEE


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 2

to the model. Such an extended model can be used end business processes. The two terms overlap some-
for decision support. what and the distinction has been heavily discussed in
Recently, the IEEE Task Force on Process Mining the last decade. Orchestration and choreography can
released the Process Mining Manifesto [13]. This man- be seen as different “perspectives”. Choreography is
ifesto describes guiding principles and challenges for concerned with the overall set of interactions between
process mining. In this paper, we will summarize several services that do not need to be in some
these and relate them to service mining. In particular, hierarchical relationship to one another. Orchestration
we will highlight two challenges specific for service is concerned with the interactions of a single service
mining: (a) “How to Correlate Instances?” and (b) with its environment (classical composition).
“How to Analyze Services Out of Context?”. A service X may have both a sell side and buy side.
The first challenge looks at the problem that process The sell side describes the interface that the service
instances (i.e., cases) in one service need to be related offers to its service consumers, i.e., other services that
to process instances in other services. For example, use X. The buy side describes the interface that is
one customer order may relate to several deliveries used to interact with other service providers, i.e., the
in another service. It does not make much sense to services used by X. A service having both a sell side
look at one service in isolation. Therefore, correlation and a buy side may act as a service producer and a
between services is important. service consumer.
The second challenge specific for service mining Interactions between services may be synchronous
is the problem that one can only observe services or asynchronous. In the context of web services, most
running in a particular environment. The behavior interactions are asynchronous and realized through
may change dramatically when a service is embedded message passing. We consider three types of commu-
in another environment. The performance or correct- nication primitives: synchronize (synchronous commu-
ness of a service in one context may say very little nication), send (asynchronous communication), and
about the performance or correctness of the very same receive (asynchronous communication) [15].
service in another context. In many service-oriented systems it is possible to
The remainder of this paper is organized as follows. record communicative actions. For example, it may
Section 2 briefly introduces our view on service orien- be possible to tap-off the messages that are being
tation. Then, we introduce the topic of process mining exchanged. In some systems it is also possible to
using an example involving two services (Section 3). observe the activities inside a service. For example,
Section 4 summarizes the guiding principles and chal- middleware products and BPM/WFM systems often
lenges described in the Process Mining Manifesto by provide detailed audit trails covering all activities
the IEEE Task Force on Process Mining. In Section 5 supported [9].
we elaborate on the two challenges specific for service It is relatively easy to collect all events related to
mining. Section 6 concludes the paper. an orchestration as one service serves as the “spider
in the web”. For choreographies this is often more
difficult as control is distributed. Even when each
2 S ERVICES service records events in a detailed manner, it may
Although the term “web services” is often associated still be impossible to merge all event data into a single
to a stack of Internet standards, we take a more overall event log.
broader perspective in this paper. Service-orientation In the remainder, we first focus on process mining
does not depend on a particular technology as is elo- and the manifesto by the IEEE Task Force on Process
quently described in the introduction of the “Service Mining. Subsequently, we zoom in on challenges spe-
Sciences Research Manifesto” [14]. The key idea of cific for service mining.
service-orientation is to subcontract work to specialized
services in a loosely coupled fashion. 3 U SING P ROCESS M INING FOR THE A NAL -
In a Service Oriented Architecture (SOA) services are YSIS OF S ERVICES
interacting, e.g., by exchanging messages. By com- Process mining techniques are able to extract knowl-
bining basic services more complex services can be edge from event logs commonly available in today’s
created [1], [2]. Orchestration is concerned with the information systems [10]. These techniques provide
composition of services seen from the viewpoint of new means to discover, monitor, and improve processes
single service (the “spider in the web”). Choreography in a variety of application domains. In this section
is concerned with the composition of services seen we introduce the three basic types of process min-
from a global viewpoint focusing on the common and ing: process discovery (Section 3.1), conformance checking
complementary observable behavior. Choreography is (Section 3.2), and model enhancement (Section 3.3).
particularly relevant in a setting where there is no
single coordinator. 3.1 Process Discovery
The terms orchestration and choreography describe As indicated earlier, a so-called event log serves as a
two aspects of integrating services to create end-to- starting point for analysis. An event log can be seen
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 3

as a collection of events. An example of an event may σ1 and σ2 onto A1 , we find traces present in L1 .
be: Projecting these traces onto A2 yields traces in L2 .
caseid : customer order 7564 In our example logs, events refer to activity names.
activity : place order An event log could also hold events related to mes-
timestamp : 28-11-2011@16:54 sages being sent or received. In some logs one may
resource : John even find events related to the start and completion of
amount : e2500 activities. This way one can measure the duration of
ordereditems : [iPod, iPhone 4, iPad 2] activities. Fig. 1 only shows asynchronous communi-
cation as both services are connected through message
Depending on the event log, there may be thousands places m1, . . . , m6. It is also possible to model syn-
or even millions of such events. Although all event chronous interaction by transition fusion. For exam-
attributes can be used for process mining, we focus on ple, if po and ro interact synchronously, one can model
the first two attributes that are mandatory for process this by merging both transitions into one. However,
mining. Any event should refer to a case (i.e., a process all of these variations do not change the basic idea of
instance) and an activity. Moreover, events related to a process discovery for services: automatically learning
particular case should be ordered. the behavior of services based on event data.
If we limit ourselves to the minimal information One could argue that there is no need to discover
needed for process mining, then an event log services since they have been implemented before.
can be described as a multiset of traces where each Instead of process discovery based on event logs,
trace corresponds to a sequence of activities. Given a one could also refactor models based on source code
set of activities A1 = {po, rg, sp, rr , cg, pc, cl }, an or documentation. However, given the autonomous
example trace is !po, rg, cg, sp, pc, cl ". This trace nature of services, one party may not have models
represents the sequence of activities executed for a of the services of another party, services may change
particular instance. An example of an event log is their behavior over time, or only use a fraction of the
L1 = [!po, rg, cg, sp, pc, cl "20 , !po, rg, sp, cg, pc, cl "18 , possible modeled behavior. Therefore, it may be more
!po, sp, rg, cg, pc, cl "17 , !po, sp, rg, pc, cg, cl "13 , reliable and efficient to discover service behavior from
!po, sp, pc, rg, cg, cl "12 , !po, rg, sp, pc, cg, cl "8 , real event data. Moreover, the discovered model is
!po, rr , cl "7 ]. This event log contains 7 different only the starting point for other types of analysis as
traces describing 95 cases. For example, there are 7 is explained next. The alignment of real event data
cases that follow trace !po, rr , cl ". and models is the key ingredient of process mining.
The goal of process discovery is to discover process
models from logs such as L1 . For example, the α 3.2 Conformance Checking
algorithm [16] will automatically construct the Petri Combining services to realize desired end-to-end pro-
net depicted in the left subprocess of Fig. 1. This Petri cesses is a far from trivial and error prone process.
net models that all cases start with activity po (“place It is easy to inadvertently introduce deadlocks and
order”) and end with activity cl (“close”). In-between other behavioral anomalies. For example, the com-
these two activities either the set of four activities rg, bined model in Fig. 1 contains a subtle error. This error
sp, cg and pc, or just activity rr (“receive rejection”) has been added to illustrate the need for verification
is executed. Activity rg is always followed by cg and techniques. Consider the trace: !po, ro, sp, sr ". This
activity sp is always followed by pc. However, rg may trace results in the state where the places p2 , p5 , m3 ,
be before or after sp, etc. The left subprocess shown m4 , and p14 each contain one token, i.e., the process
in Fig. 1 allows for seven different traces. These are on the right rejected the order whereas the process
exactly the traces that can be found in event log L1 . on the left already paid for it. The resulting state is a
Now consider the right subprocess shown in so-called deadlock.
Fig. 1. This process has six activities A2 = The two individual processes have no such prob-
{ro, sg, rp, cp, gc, sr }. An example of an event log over lems when analyzed in isolation. The left subprocess
A2 is L2 = [!ro, sg, rp, cp, gc"88 , !ro, sr "7 ]. This log also has 11 possible states and it is always possible to end
describes 95 cases and can be seen as the counterpart with a token in p8 . The right subprocess has 6 states
of L1 . The subprocess on the right-hand side of Fig. 1 and it is always possible to end with a token in p14 .
can be discovered from L2 using process mining The combined process has 26 reachable states. There
techniques such as the α algorithm [16]. are two states where no transitions are enabled: the
One can view Fig. 1 as a model of two interact- desired final state with tokens in p8 and p14 and the
ing services. Whereas L1 and L2 describe the event deadlock just described.
logs of individual services, one can also look at an The lion’s share of attention in service analysis has
event log describing the combined overall behavior. been devoted to design-time verification. However,
An example of a trace possible according to Fig. 1 correctness issues are not limited to design-time. At
is σ1 = !po, ro, sg, rg, cg, sp, rp, cp, pc, gc, cl ". Another run-time the actual behavior may deviate from the mod-
example trace is σ2 = !po, ro, sr , rr , cl ". If we project eled behavior. Conformance checking techniques aim
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 4

customer supplier
service p1 service
p9

place receive
po ro
order order
m1

p10
p2 p3 send
m2 sg
goods
receive send
rg sp
goods payment
m3 p11
receive receive send
p4 rr rp sr
rejection payment rejection
p5
m4
confirm payment p12
cg pc
goods confirmed
confirm
cp
payment
m5

p6 p7
p13
goods
close cl gc
confirmed
m6
p14
p8

Fig. 1. A process composed of two services modeled in terms of Petri nets.

to find discrepancies between the modeled behavior customer


service p1
and the real behavior recorded in event logs. For
example, suppose that an event log contains the trace:
place po
!po, rg, ro, sg, rp, cp, cg, sp, pc, gc, cl ". This trace is not order
receive
possible according to the combined model in Fig. 1. goods
m1

First of all, the goods are received by the customer ser- rg

vice without ever being sent by the supplier service. p2 m2

Second, the payment is received before being sent.


Figure 2 shows an alternative model for the cus- receive
rr m3
rejection send
tomer service. If the original customer service in Fig. 1 payment
sp
is replaced by this service, the deadlock is removed.
p3 m4
Hence, from a (design-time) correctness point of view payment
confirmed
the service in Fig. 2 is better than the original one. pc
However, if event log L1 (the log describing 95 cases) p4
confirm
m5
describes the observed behavior, then conformance is goods
cg
poor because many traces in the log are impossible p5
m6
according to the model. Only the eight cases fol-
lowing trace !po, rg, sp, pc, cg, cl " and the seven cases
following trace !po, rr , cl " fit completely. The other p6

95 − (8 + 7) = 80 cases deviate from the model and, close cl


therefore, do not fit.
Conformance can be viewed from two angles: (a) p7
the model does not capture the real behavior (“the
model is wrong”) and (b) reality deviates from the
desired model (“the event log is wrong”). The first Fig. 2. A sequential customer service.
viewpoint is taken when the model is supposed to be
descriptive, i.e., capture or predict reality. The second
viewpoint is taken when the model is normative, i.e., pliance checking, auditing, certification, and run-time
used to influence or control reality. monitoring. Moreover, it can be used to judge the
Conformance checking is very important for com- quality of discovered or hand-made models. Typically,
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 5

four quality dimensions for comparing model and log behavior. Clearly, the “flower model” lacks precision.
are considered: (a) fitness, (b) simplicity, (c) precision, A model that is not precise is “underfitting”. Under-
and (d) generalization [10]. fitting is the problem that the model over-generalizes
A model with good fitness allows for most of the the example behavior in the log (i.e., the model allows
behavior seen in the event log. A model has perfect for behaviors very different from what was seen in the
fitness if all traces in the log can be replayed by the log).
model from beginning to end. There are various ways A model should also generalize and not restrict
of quantifying fitness [10], [17], [18]. Often fitness is behavior to just the examples seen in the log. A model
described by a number between 0 (very poor fitness) that does not generalize sufficiently is “overfitting”.
and 1 (perfect fitness). Overfitting is the problem that a very specific model
Obviously, the simplest model that can explain the is generated whereas it is obvious that the log only
behavior seen in the log is the best model. This holds example behavior (i.e., the model explains the
principle is known as Occam’s Razor. particular sample log, but a next sample log of the
same process may produce a completely different
process model).
customer
service
p1 All four quality dimensions for comparing model
and log can be quantified in various ways. See [10],
place po [17], [18] for more details. Also see [7], [8] for a
order receive
goods m1 more declarative constraint-based modeling language
rg suitable for conformance checking in the context of
receive m2 services.
rejection
rr
send
3.3 Model Enhancement
m3
payment It is also possible to extend or improve an existing pro-
sp
p2
cess model using the event log. A non-fitting process
payment
confirmed
m4
model can be corrected using the diagnostics provided
pc by the alignment of model and log. Moreover, event
confirm m5 logs may contain information about resources, times-
goods
tamps, and case data. For example, an event referring
cg
to activity “place order” and case “customer order
m6
close cl
7564” may also have attributes describing the person
that entered the order (e.g., “John”), the time of the
p3
event (e.g., “28-11-2011@16:54”), the ordered amount
(e.g., “e2500”), and the multiset of items ordered.
After aligning model and log it is possible to replay
Fig. 3. A so-called “flower model” allowing for too much the event log on the model. While replaying one
behavior. can analyze these additional attributes and add other
perspectives to the model.
Fitness and simplicity alone are not sufficient to For example, as Fig. 4 shows, it is possible to
judge the quality of a discovered process model. For analyze waiting times in-between activities. Simply
example, it is very easy to construct an extremely measure the time differences between causally related
simple Petri net (“flower model”) that is able to replay events and compute basic statistics such as averages,
all traces in an event log (but also any other event variances, and confidence intervals. This way it is
log referring to the same set of activities). See for possible to identify the main bottlenecks [10], [19].
example the customer service shown in Fig. 3. After po Information about resources can be used to discover
all other transitions are enabled and remain enabled roles, i.e., groups of people frequently executing re-
until the end. For example, traces such as !po, cl " lated activities [20]. Here, standard clustering tech-
and !po, rg, rg, rg, rg, cl " are possible according to this niques can be used. It is also possible to construct
“flower model” but very unlikely given the observed social networks based on the flow of work and ana-
behavior recorded in event log L1 . lyze resource performance (e.g., the relation between
Similarly, it is often undesirable to have a model workload and service times).
that only allows for the exact behavior seen in the Standard classification techniques can be used to
event log. Remember that the log contains only ex- analyze the decision points in a process model. For
ample behavior and that many traces that are possible example, after receiving the order (place p10 ) there are
may not have been seen yet. (Note that in our simple two possibilities: the goods are sent (sg) or a rejection
example all traces are frequent, so the problem does is sent (sr ). Using the data known about the case
not apply here.) prior to the decision, we can construct a decision tree
A model is precise if it does not allow for “too much” explaining the observed behavior [10], [21].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 6

The event log can be used to discover roles


in the organization (e.g., groups of people
with similar work patterns). These roles can
Role A: Role E: Role M: be used to relate individuals and activities. Role B: Role C:

Pete Sue Sara Tom Sam

Mike Sean May Jeff

Ellen Discovery techniques can be used to


find a control-flow model (in this case Jill
Decision rules (e.g., a decision tree based
in terms of a Petri net) that describes
on data known at the time a particular
the observed behavior best.
choice was made) can be learned from
customer supplier the event log and used to annotate
service p1 service decisions.
p9
M C
place receive
po ro
order order
m1 m1

B p10
p2 p3 send
m2 m2 sg
M goods
receive send
rg sp
goods payment
A B
m3 m3 C p11
receive receive send
p4 rr rp sr
rejection payment rejection
p5
A m4 m4
confirm payment B p12
cg pc
goods confirmed
E confirm
cp
payment
m5 m5

p6 p7 B p13
Performance information (e.g., the
goods average time between two subsequent
close cl gc
confirmed activities) can be extracted from the
m6 m6 event log and visualized on top of the
p14
model.
p8

Fig. 4. By replaying the event log on the process model for a service it is possible to analyze bottlenecks,
decision points, and the distribution of work over roles and resources.

Figure 4 illustrates that process mining is not lim- frequency of events. When events bear timestamps
ited to control-flow discovery. In fact, process min- it is possible to discover bottlenecks, measure service
ing may cover a variety of perspectives. The control- levels, monitor the utilization of resources, and predict
flow perspective only focuses on the control-flow, i.e., the remaining processing time of running cases.
the ordering of activities. The goal of mining this Process mining is not restricted to offline analysis
perspective is to find a good characterization of all and can also be used for predictions and recommenda-
possible paths. The result is typically expressed in tions at runtime. For example, the completion time of
terms of a Petri net or some other process notation a partially handled customer order can be predicted
(e.g., EPCs, BPMN, or UML activity diagrams). The using a discovered process model with timing infor-
organizational perspective focuses on information about mation [19].
resources hidden in the log, i.e., which actors (e.g.,
people, systems, roles, or departments) are involved 4 P ROCESS M INING M ANIFESTO
and how are they related. The goal is to either struc-
The IEEE Task Force on Process Mining recently re-
ture the organization by classifying people in terms
leased a manifesto describing guiding principles and
of roles and organizational units or to show the social
challenges [13]. The manifesto aims to increase the
network. The system architecture perspective focuses
visibility of process mining as a new tool to improve
on the IT infrastructure, i.e., the decomposition into
the (re)design, control, and support of operational
services and other software components. Techniques
business processes. It is intended to guide software
for mining the organizational perspective [20] can
developers, scientists, consultants, and end-users. Be-
also be applied to this more technical perspective.
fore summarizing the manifesto, we briefly introduce
For example, social network analysis can be used to
the task force.
visualize the communication among services. The case
perspective focuses on properties of cases. Obviously, a
case can be characterized by its path in the process or 4.1 Task Force on Process Mining
by the actors working on it. However, cases can also The growing interest in log-based process analysis
be characterized by the values of the corresponding motivated the establishment of the IEEE Task Force
data elements. For example, if a case represents a on Process Mining. The goal of this task force is to
replenishment order, it may be interesting to know promote the research, development, education, and
the supplier or the number of products ordered. The understanding of process mining. The task force was
time perspective is concerned with the timing and established in 2009 in the context of the Data Mining
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 7

Technical Committee of the Computational Intelli- 4.3 Challenges


gence Society of the IEEE. Members of the task force Process mining is an important tool for modern orga-
include representatives of more than a dozen commer- nizations that need to manage non-trivial operational
cial software vendors (e.g., Pallas Athena, Software processes. On the one hand, there is an incredible
AG, Futura Process Intelligence, HP, IBM, Fujitsu, growth of event data. On the other hand, processes
Infosys, and Fluxicon), ten consultancy firms (e.g., and information need to be aligned perfectly in order
Gartner and Deloitte) and over twenty universities. to meet requirements related to compliance, efficiency,
Concrete objectives of the task force are: to make and customer service. Despite the applicability of
end-users, developers, consultants, managers, and process mining there are still important challenges
researchers aware of the state-of-the-art in process that need to be addressed; these illustrate that process
mining, to promote the use of process mining tech- mining is an emerging discipline. Table 2 lists the
niques and tools, to stimulate new process mining eleven challenges described in the manifesto [13].
applications, to play a role in standardization ef- As an example consider Challenge C4: “Dealing
forts for logging event data, to organize tutorials, with Concept Drift”. The term concept drift refers to the
special sessions, workshops, panels, and to publish situation in which the process is changing while being
articles, books, videos, and special issues of jour- analyzed [22]. For instance, in the beginning of the
nals. For example, in 2010 the task force standard- event log two activities may be concurrent whereas
ized XES (www.xes-standard.org), a standard log- later in the log these activities become sequential. Pro-
ging format that is extensible and supported by the cesses may change due to periodic/seasonal changes
OpenXES library (www.openxes.org) and by tools such (e.g., “in December there is more demand” or “on
as ProM, XESame, Nitro, etc. See http://www.win. Friday afternoon there are fewer employees avail-
tue.nl/ieeetfpm/ for recent activities of the task force. able”) or due to changing conditions (e.g., “the market
is getting more competitive”). Such changes impact
processes and it is vital to detect and analyze them
[22].
4.2 Guiding Principles The manifesto is supported by 53 organizations and
77 process mining experts from all over the globe
As with any new technology, there are obvious mis- contributed to it. The manifesto is available in Chi-
takes that can be made when applying process mining nese, Dutch, English, French, German, Greek, Italian,
in real-life settings. Therefore, the six guiding princi- Japanese, Korean, Portuguese, Spanish, and Turkish
ples listed in Table 1 aim to prevent users/analysts (cf. www.win.tue.nl/ieeetfpm).
from making such mistakes. As an example, consider
guiding principle GP4: “Events Should Be Related to
Model Elements”. It is a misconception that process 5 S PECIFIC C HALLENGES FOR S ERVICE
mining is limited to control-flow discovery, other M INING
perspectives such as the organizational perspective, The challenges mentioned in the manifesto apply to
the time perspective, and the data perspective are wide range of application domains (including services
equally important. However, the control-flow per- computing). However, for service mining, i.e., apply-
spective (i.e., the ordering of activities) serves as the ing process mining to services, we also discuss two
layer connecting the different perspectives. Therefore, more specific challenges.
it is important to relate events in the log to activi-
ties in the model. Conformance checking and model
enhancement heavily rely on this relationship (cf. 5.1 How to Correlate Instances?
sections 3.2 and 3.3). After relating events to model Figure 1 describes two interacting services. Although
elements, it is possible to “replay” the event log there may be many orders, the Petri net describes
on the model [10]. Replay may be used to reveal only one instance of the customer service and one
discrepancies between an event log and a model, e.g., instance of the supplier service. These two instances
some events in the log are not possible according to relate to the same order. Because there may be many
the model. Techniques for conformance checking can orders, instances of the customer service need to be
be used to quantify and diagnose such discrepancies. related to instances of the supplier service and vice
Timestamps in the event log can be used to analyze versa. Relating instances of different services is com-
the temporal behavior during replay. Time differences monly referred to as correlation. For example, when
between causally related activities can be used to the supplier service receives a payment (activity rp),
add average/expected waiting times to the model. this payment needs to be correlated to the order for
These examples illustrate the importance of guiding which the goods have been sent.
principle GP4; the relation between events in the log In general there is not a one-to-one correspondence
and elements in the model serves as a starting point between instances (i.e., cases) of different services.
for different types of analysis. This is illustrated in Fig. 5 which sketches instances
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 8

TABLE 1
Six Guiding Principles Listed in the Manifesto

Event Data Should Be Treated as First-Class Citizens


GP1 Events should be trustworthy, i.e., it should be safe to assume that the recorded events actually happened and
that the attributes of events are correct. Event logs should be complete, i.e., given a particular scope, no events
may be missing. Any recorded event should have well-defined semantics. Moreover, the event data should be
safe in the sense that privacy and security concerns are addressed when recording the event log.
Log Extraction Should Be Driven by Questions
GP2 Without concrete questions it is very difficult to extract meaningful event data. Consider, for example, the
thousands of tables in the database of an ERP system like SAP. Without questions one does not know where
to start.
Concurrency, Choice and Other Basic Control-Flow Constructs Should be Supported
GP3 Basic workflow patterns supported by all mainstream languages (e.g., BPMN, EPCs, Petri nets, BPEL, and
UML activity diagrams) are sequence, parallel routing (AND-splits/joins), choice (XOR-splits/joins), and loops.
Obviously, these patterns should be supported by process mining techniques.
Events Should Be Related to Model Elements
GP4 Conformance checking and enhancement heavily rely on the relationship between elements in the model and
events in the log. This relationship may be used to “replay” the event log on the model. Replay can be used to
reveal discrepancies between event log and model (e.g., some events in the log are not possible according to
the model) and can be used to enrich the model with additional information extracted from the event log (e.g.,
bottlenecks are identified by using the timestamps in the event log).
Models Should Be Treated as Purposeful Abstractions of Reality
GP5 A model derived from event data provides a view on reality. Such a view should serve as a purposeful abstraction
of the behavior captured in the event log. Given an event log, there may be multiple views that are useful.
Process Mining Should Be a Continuous Process
GP6 Given the dynamical nature of processes, it is not advisable to see process mining as a one-time activity. The
goal should not be to create a fixed model, but to breathe life into process models such that users and analysts
are encouraged to look at them on a daily basis.

TABLE 2
Some of the Most Important Process Mining Challenges Identified in the Manifesto

Finding, Merging, and Cleaning Event Data


C1 When extracting event data suitable for process mining several challenges need to be addressed: data may be
distributed over a variety of sources, event data may be incomplete, an event log may contain outliers, logs may
contain events at different level of granularity, etc.
Dealing with Complex Event Logs Having Diverse Characteristics
C2 Event logs may have very different characteristics. Some event logs may be extremely large making them
difficult to handle whereas other event logs are so small that not enough data is available to make reliable
conclusions.
Creating Representative Benchmarks
C3 Good benchmarks consisting of example data sets and representative quality criteria are needed to compare
and improve the various tools and algorithms.
Dealing with Concept Drift
C4 The process may be changing while being analyzed. Understanding such concept drifts is of prime importance
for the management of processes.
Improving the Representational Bias Used for Process Discovery
C5 A more careful and refined selection of the representational bias is needed to ensure high-quality process mining
results.
Balancing Between Quality Criteria such as Fitness, Simplicity, Precision, and Generalization
C6 There are four competing quality dimensions: (a) fitness, (b) simplicity, (c) precision, and (d) generalization.
The challenge is to find models that score good in all four dimensions.
Cross-Organizational Mining
C7 There are various use cases where event logs of multiple organizations are available for analysis. Some
organizations work together to handle process instances (e.g., supply chain partners) or organizations are
executing essentially the same process while sharing experiences, knowledge, or a common infrastructure.
However, traditional process mining techniques typically consider one event log in one organization.
Providing Operational Support
C8 Process mining is not restricted to off-line analysis and can also be used for online operational support. Three
operational support activities can be identified: detect, predict, and recommend.
Combining Process Mining With Other Types of Analysis
C9 The challenge is to combine automated process mining techniques with other analysis approaches (optimization
techniques, data mining, simulation, visual analytics, etc.) to extract more insights from event data.
Improving Usability for Non-Experts
C10 The challenge is to hide the sophisticated process mining algorithms behind user-friendly interfaces that
automatically set parameters and suggest suitable types of analysis.
Improving Understandability for Non-Experts
C11 The user may have problems understanding the output or is tempted to infer incorrect conclusions. To avoid
such problems, the results should be presented using a suitable representation and the trustworthiness of the
results should always be clearly indicated.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 9

customer order service order line service As discussed in [10], instance correlation and pro-
co co ol ol ol cess views are closely related. Process models should
45 61 66 68
44 be seen as views on reality. The scope of such a view
co ol ol
46 ol ol
depends on the correlations deemed to be relevant.
63 67
62 72 For example, one may want to view customer or-
ders in isolation (abstracting away order lines and
deliveries) or view these orders from the viewpoint
delivery service of deliveries. The empirical nature of process mining
del helps us to understand the “fabric of real business
del
88 processes” better whereas conventional process mod-
83 del eling languages are tailored towards straightjacketing
del
82 89 instances into monolithic process models.

Fig. 5. Three services having several instances each.


5.2 How to Analyze Services Out of Context?
For the second challenge we return to the problem
of three related services. The customer order service that the two interacting services shown in Fig. 1 can
has three instances: co44 , co45 , and co46 . Each of deadlock. Although each of the services in isolation
these instances corresponds to a customer order. The has no problem, the composed service deadlocks
customer order service uses an order line service when trace !po, ro, sp, sr " is executed. The reason is
which handles individual order lines. For example, that the customer may decide to pay whereas the
payments are handled at the order level, but order supplier may decide to reject the order. In fact, there
picking is done at the level of individual items. Cus- is no supplier service that can work properly with
tomer order co44 consists of three order lines (ol61 , the customer service modeled in Fig. 1. The choice
ol62 , and ol63 ), i.e., three items have been ordered. between rg and rr is controlled by the environment
The actual delivery of such items is subcontracted via places m2 and m3 . However, independent of this
to a delivery service. The delivery service tries to external decision, the customer service may decide
combine multiple items into a single delivery. How- to pay (sp). Because asynchronous message passing
ever, this is not always possible. For example, order is used, these two choices cannot be coordinated
co44 is related to two delivery instances: del88 and properly. This illustrates the subtle interplay between
del89 . It may also be the case that one delivery is a service and it environment.
related to order lines of different customer orders (see Let us now consider the two alternative services
for example del88 ). Hence, there is a many-to-many depicted in Fig. 6. The process resulting from these
relationship between customer order instances and combined services does not have any deadlocks or
delivery instances. other anomalies. Inspection of the state space consist-
Situations similar to the one described in Fig. 5 oc- ing of 16 states shows that it is always possible to
cur frequently in real-life processes. However, existing reach the desired final state with tokens in p8 and
process notations and process mining algorithms have p16 .
difficulties dealing with many-to-many relationships A possible event log generated by
between service instances. In fact, three related prob- the two services in Fig. 6 is L! =
lems can be identified. First of all, there is a represen- [!po, ro, sg, rg, sp, rp, cp, pc, cg, gc, fi , cl "40 ,
tation problem. Traditional process notations such as !po, ro, sg, rg, sp, rp, cp, pc, cg, gc, cl , fi "28 ,
BPMN, EPCs, and UML activity diagrams model the !po, ro, sg, rg, sp, rp, cp, pc, cg, cl , gc, fi "20 ,
life-cycle of one instance in isolation. Notable excep- !po, ro, sr , rr , cl "7 ]. This log describes 95 cases
tions are proclets [23] and other artifact-centric process distributed over four possible traces. Event log L! can
models [24]. Second, there is the problem of relating be projected onto the customer service (left-hand side
messages to instances. There are various approaches of Fig. 6) yielding log L!1 = [!po, rg, sp, pc, cg, cl "88 ,
that all suggest analyzing the content of each message. !po, rr , cl "7 ]. Similarly, L! can be projected onto
Subsequently, messages are related based on reoccur- the supplier service resulting in event log
ring pieces of text (e.g., an identifier or a customer L!2 = [!ro, sg, rp, cp, gc, fi "88 , !ro, sr "7 ].
name) [6], [25], [26]. Finally, there is a lack of process Although L!1 and L!2 do not reveal any problems,
mining techniques able to discover process models it should be noted that the two services are lim-
with many-to-many relationships. Some initial work iting each other’s behavior. For example, a possi-
has been done in the context of the ACSI project [27]. ble trace of the customer service that was not ob-
For example, it is possible to do conformance checking served is !po, rg, pc, sp, cg, cl ". This trace could not
on proclets [28], [29]. However, better techniques are be observed because the supplier service only con-
needed to truly support more complex correlation firms the payment after it has been received (which
patterns. makes sense). Possible traces of the supplier service
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 10

customer supplier
service p1 service
p9

place receive
po ro
order order
m1
p2
p10
receive send
send rejection
rejection p3 m2 sg goods
rr receive rg
goods sr
send
payment m3 p11
p4 sp
receive
rp
payment p5 payment
confirmed pc
m4 p12
confirm p13
p6 goods
cg confirm
cp
payment
m5 gc

p7 p14 goods
confirmed
close cl fi
p15
finish
m6
p16
p8

Fig. 6. Another process composed of two services that limit each other’s behavior.

that were not observed are !ro, sg, gc, rp, cp, fi " and sg and sr on the other hand is not controlled by the
!ro, sg, rp, gc, cp, fi ". The reason is that the customer environment and is made inside the supplier service.
service only confirms the receipt of the goods after it The challenge is to adapt discovery techniques and
has gotten a payment confirmation. These examples conformance checking techniques to incorporate such
show that the environment of a service may severely distinctions.
limit the behavior that can actually be observed. One
can only record the behavior of a service in a partic- 5.3 Related Work
ular context. This implies that certain anomalies do
not surface in one context and only emerge if the After describing the two challenges specific for service
environment of the service is changed. mining, we briefly point out related work on the
analysis of services based on event data.
The observation that service behavior is limited by
In [9] a concrete application of process mining to
its context has implications for both process discovery
web services is described. IBM’s WebSphere product
and conformance checking.
is used as a reference system and its CEI (Common
The discovered process model may only show a Event Infrastructure) logs are analyzed using ProM.
fraction of the potential service behavior. For example, An approach to check the conformance of web
based on event log L!2 , process discovery techniques services was described in [6]. The paper includes
will discover a sequential model for the supplier a description of various experiments using Oracle
service. The conclusion that the goods can only be BPEL. The token-based replay techniques presented
confirmed after all payment related activities have in [18] were used to measure conformance.
completed only holds in the context depicted in Fig. 6. In [7], [8] an LTL-based approach to check confor-
The same phenomenon makes it difficult to reason mance was proposed. This approach uses a graph-
about precision. Based on event log L!2 , analysis may ical declarative language to describe the normative
suggest that the model in Fig. 6 is “underfitting” be- behavior of services. Rather than modeling a detailed
cause only two of the four possible traces are actually process, this approach allows for checking graphically
observed. specified constraints such as “a payment should al-
Today’s process mining techniques do not take the ways be confirmed”.
direction of messages (send or receive) and the nature The topic of event correlation has been investigated
of choices into account. For example, consider the in the context of system specification, system devel-
choice between rr and rg. This choice is controlled by opment, and services analysis. In [15] and [30] various
inbound messages m2 and m3 . The choice between interaction and correlation patterns are described. In
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 11

[25] a technique is presented for correlating messages BPM systems. However, mainstream BPM approaches
with the goal to visualize the execution of web ser- are not using event data. Yet, activities executed by
vices. people, services, machines, and software components
Dustdar et al. [31], [32], [33] proposed techniques leave trails in so-called event logs. Process mining
for services interaction mining, i.e., applying process techniques use such logs to discover, analyze, and
mining techniques to the analysis of service interac- improve business processes.
tions. In this paper, we discussed process mining in the
Nezhad et al. [26], [34] developed techniques for context of services. We used the guiding principles
event correlation and process discovery from web and challenges described in the recently released Pro-
service interaction logs. The authors introduce the cess Mining Manifesto [13] to describe the state-of-the-
notion of a “process view” which is the result of a art in process mining. Given the abundance of event
particular event correlation. However, they argue that logs in the context of web services, we advocate the
correlation is subjective and that multiple views are use of such techniques for what we call service mining.
possible. A collection of process views is called the Service mining is concerned with (a) the discovery
“process space”. of service behavior, (b) checking conformance of ser-
In [35], Simmonds et al. propose a technique for vices, and (c) service model extension (e.g., showing
the run-time monitoring of web service conversations. bottlenecks based on event data). We also discussed
The authors monitor conversations between partners two challenges specific for service mining: (a) “How
at runtime as a means of checking behavioral correct- to Correlate Instances?” and (b) “How to Analyze
ness of the entire web service system. This is related Services Out of Context?”. These challenges show
to the earlier work on conformance checking [6], [7], that, despite the huge potential of service mining,
[8], [17], [18] mentioned before. additional research is needed to improve the appli-
In [36], a web service mining framework is pro- cability of process mining in distributed and loosely
posed that allows unexpected and interesting service coupled systems.
compositions to automatically emerge in a bottom-up
fashion. The authors propose some mining techniques ACKNOWLEDGMENTS
aiming at the discovery of such service compositions. The author would like to thank all that contributed
An event-driven approach to validate the transac- to the Process Mining Manifesto: Arya Adriansyah,
tional behavior of service compositions was proposed Ana Karla Alves de Medeiros, Franco Arcieri, Thomas
in [37]. Verification is done at design-time or run-time Baier, Tobias Blickle, Jagadeesh Chandra Bose, Peter
using the event calculus formalism. van den Brand, Ronald Brandtjen, Joos Buijs, An-
Within the ACSI project [27] the focus is on many- drea Burattin, Josep Carmona, Malu Castellanos, Jan
to-many relationships between instances. So-called Claes, Jonathan Cook, Nicola Costantini, Francisco
“proclets” [23] are used to model artifact centric Curbera, Ernesto Damiani, Massimiliano de Leoni,
models. A conformance checking approach for such Pavlos Delias, Boudewijn van Dongen, Marlon Du-
models is presented in [28], [29] and implemented in mas, Schahram Dustdar, Dirk Fahland, Diogo R.
ProM. Ferreira, Walid Gaaloul , Frank van Geffen, Sukriti
In [38] the topic of “cross-organizational mining” Goel, Christian Günther, Antonella Guzzo, Paul Har-
was introduced. Here the goal is not to analyze inter- mon, Arthur ter Hofstede, John Hoogland, Jon Es-
acting services but to compare services that are vari- pen Ingvaldsen, Koki Kato, Rudolf Kuhn, Akhil
ants of one another. Cross-organizational mining can Kumar, Marcello La Rosa, Fabrizio Maggi, Donato
be used for benchmarking and reference modeling. Malerba, Ronny Mans, Alberto Manuel, Martin Mc-
Creesh, Paola Mello, Jan Mendling, Marco Montali,
6 C ONCLUSION Hamid Motahari Nezhad, Michael zur Muehlen, Jorge
Munoz-Gama, Luigi Pontieri, Joel Ribeiro, Anne Rozi-
Process mining aims to bridge the gap between nat, Hugo Seguel Pérez, Ricardo Seguel Pérez, Marcos
Business Intelligence (BI) and Business Process Manage- Sepúlveda, Jim Sinur, Pnina Soffer, Minseok Song,
ment (BPM). BI techniques are typically not process- Alessandro Sperduti, Giovanni Stilo, Casper Stoel,
centric and provide rather simplistic reporting and Keith Swenson, Maurizio Talamo, Wei Tan, Chris
dashboard functionalities. Sometimes BI tools offer Turner, Jan Vanthienen, George Varvaressos, Eric Ver-
more advanced data mining capabilities. However, beek, Marc Verdonk, Roberto Vigo, Jianmin Wang,
classical data mining techniques [21] such as classi- Barbara Weber, Matthias Weidlich, Ton Weijters, Lijie
fication, clustering, regression, association rule learn- Wen, Michael Westergaard, and Moe Wynn.
ing, and sequence/episode mining are not process-
centric at all. BPM techniques on the other hand R EFERENCES
gravitate around process models [39]. These models
[1] G. Alonso, F. Casati, H. Kuno, and V. Machiraju, Web Ser-
are typically made by hand and are used for analysis vices Concepts, Architectures and Applications. Springer-Verlag,
(e.g., simulation and verification) and enactment by Berlin, 2004.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 12

[2] L. Zhang, J. Zhang, and H. Cai, Services Computing, Core Processes,” International Journal of Cooperative Information Sys-
Enabling Technology of the Modern Services Industry. Springer- tems, vol. 10, no. 4, pp. 443–482, 2001.
Verlag, Berlin, 2007. [24] K. Bhattacharya, C. Gerede, R. Hull, R. Liu, and J. Su, “To-
[3] W. van der Aalst, “Don’t go with the flow: Web services com- wards Formal Analysis of Artifact-Centric Business Process
position standards exposed,” IEEE Intelligent Systems, vol. 18, Models,” in International Conference on Business Process Man-
no. 1, pp. 72–76, 2003. agement (BPM 2007), ser. Lecture Notes in Computer Science,
[4] F. Casati, E. Shan, U. Dayal, and M. Shan, “Business-oriented G. Alonso, P. Dadam, and M. Rosemann, Eds., vol. 4714.
management of Web services,” Communications of the ACM, Springer-Verlag, Berlin, 2007, pp. 288–304.
vol. 46, no. 10, pp. 55–60, 2003. [25] W. Pauw, M. Lei, E. Pring, L. Villard, M. Arnold, and J. Morar,
[5] B. Benatallah, F. Casati, and F. Toumani, “Representing, “Web Services Navigator: Visualizing the Execution of Web
Analysing and Managing Web Service Protocols,” Data and Services,” IBM Systems Journal, vol. 44, no. 4, pp. 821–845, 2005.
Knowledge Engineering, vol. 58, no. 3, pp. 327–357, 2006. [26] H. Montahari-Nezhad, R. Saint-Paul, F. Casati, and B. Bena-
[6] W. van der Aalst, M. Dumas, C. Ouyang, A. Rozinat, and tallah, “Event Correlation for Process Discovery from Web
H. Verbeek, “Conformance Checking of Service Behavior,” Service Interaction Logs,” VLBD Journal, vol. 20, no. 3, pp.
ACM Transactions on Internet Technology, vol. 8, no. 3, pp. 29–59, 417–444, 2011.
2008. [27] ACSI, “Artifact-Centric Service Interoperation (ACSI) Project
[7] W. van der Aalst and M. Pesic, “Chapter 2: Specifying and Home Page,” www.acsi-project.eu.
Monitoring Service Flows: Making Web Services Process- [28] D. Fahland, M. de Leoni, B. van Dongen, and W. van der Aalst,
Aware,” in Test and Analysis of Web Services, L. Baresi and “Conformance Checking of Interacting Processes with Over-
E. Nitto, Eds. Springer-Verlag, Berlin, 2007, pp. 11–56. lapping Instances,” in Business Process Management (BPM
[8] ——, “DecSerFlow: Towards a Truly Declarative Service Flow 2011), ser. Lecture Notes in Computer Science, S. Rinderle,
Language,” in International Conference on Web Services and F. Toumani, and K. Wolf, Eds., vol. 6896. Springer-Verlag,
Formal Methods (WS-FM 2006), ser. Lecture Notes in Computer Berlin, 2011, pp. 345–361.
Science, M. Bravetti, M. Nunez, and G. Zavattaro, Eds., vol. [29] D. Fahland, M. Leoni, B. van Dongen, and W. van der Aalst,
4184. Springer-Verlag, Berlin, 2006, pp. 1–23. “Behavioral Conformance of Artifact-Centric Process Models,”
[9] W. van der Aalst and H. Verbeek, “Process Mining in Web in Business Information Systems (BIS 2011), ser. Lecture Notes in
Services: The WebSphere Case,” IEEE Bulletin of the Technical Business Information Processing, A. Abramowicz, Ed., vol. 87.
Committee on Data Engineering, vol. 31, no. 3, pp. 45–48, 2008. Springer-Verlag, Berlin, 2011, pp. 37–49.
[10] W. van der Aalst, Process Mining: Discovery, Conformance and [30] A. Barros, G. Decker, M. Dumas, and F. Weber, “Correlation
Enhancement of Business Processes. Springer-Verlag, Berlin, Patterns in Service-Oriented Architectures,” in Proceedings of
2011. the 10th International Conference on Fundamental Approaches
[11] W. van der Aalst, B. van Dongen, J. Herbst, L. Maruster, to Software Engineering (FASE 2007), ser. Lecture Notes in
G. Schimm, and A. Weijters, “Workflow Mining: A Survey Computer Science, M. Dwyer and A. Lopes, Eds., vol. 4422.
of Issues and Approaches,” Data and Knowledge Engineering, Springer-Verlag, Berlin, 2007, pp. 245–259.
vol. 47, no. 2, pp. 237–267, 2003. [31] S. Dustdar, R. Gombotz, and K. Baina, “Web Services Interac-
[12] W. van der Aalst, H. Reijers, A. Weijters, B. van Dongen, tion Mining,” Technical Report TUV-1841-2004-16, Information
A. Medeiros, M. Song, and H. Verbeek, “Business Process Min- Systems Institute, Vienna University of Technology, Wien,
ing: An Industrial Application,” Information Systems, vol. 32, Austria, 2004.
no. 5, pp. 713–732, 2007. [32] S. Dustdar and R. Gombotz, “Discovering Web Service Work-
[13] IEEE Task Force on Process Mining, “Process Mining Man- flows Using Web Services Interaction Mining,” International
ifesto,” in BPM Workshops, ser. Lecture Notes in Business Journal of Business Process Integration and Management, vol. 1,
Information Processing, vol. 99. Springer-Verlag, Berlin, 2011. no. 4, pp. 256–266, 2006.
[14] H. Chesbrough and J. Spohrer, “A Research Manifesto for [33] R. Gombotz and S. Dustdar, “On Web Services Mining,” in
Services Science,” Communications of the ACM, vol. 49, no. 7, BPM 2005 Workshops (Workshop on Business Process Intelligence),
pp. 35–40, 2006. ser. Lecture Notes in Computer Science, C. Bussler et al., Ed.,
[15] W. van der Aalst, A. Mooij, C. Stahl, and K. Wolf, “Service vol. 3812. Springer-Verlag, Berlin, 2005, pp. 216–228.
Interaction: Patterns, Formalization, and Analysis,” in Formal [34] H. Nezhad, R. Saint-Paul, B. Benatallah, and F. Casati, “De-
Methods for Web Services, ser. Lecture Notes in Computer riving Protocol Models from Imperfect Service Conversation
Science, M. Bernardo, L. Padovani, and G. Zavattaro, Eds., Logs,” IEEE Transaction on Knowledge and Data Engineering,
vol. 5569. Springer-Verlag, Berlin, 2009, pp. 42–88. vol. 20, no. 12, pp. 1683–1698, 2008.
[16] W. van der Aalst, A. Weijters, and L. Maruster, “Workflow [35] J. Simmonds, Y. Gan, M. Chechik, S. Nejati, B. Farrell, E. Litani,
Mining: Discovering Process Models from Event Logs,” IEEE and J. Waterhouse, “Runtime Monitoring of Web Service Con-
Transactions on Knowledge and Data Engineering, vol. 16, no. 9, versations,” IEEE Transactions on Services Computing, vol. 2,
pp. 1128–1142, 2004. no. 3, pp. 223–244, 2009.
[17] A. Adriansyah, B. van Dongen, and W. van der Aalst, “Confor- [36] G. Zheng and A. Bouguettaya, “Service Mining on the Web,”
mance Checking using Cost-Based Fitness Analysis,” in IEEE IEEE Transactions on Services Computing, vol. 2, no. 1, pp. 65–78,
International Enterprise Computing Conference (EDOC 2011). 2009.
IEEE Computer Society, 2011. [37] W. Gaaloul, S. Bhiri, and M.Rouached, “Event-Based Design
[18] A. Rozinat and W. van der Aalst, “Conformance Checking and Runtime Verification of Composite Service Transactional
of Processes Based on Monitoring Real Behavior,” Information Behavior,” IEEE Transactions on Services Computing, vol. 3, no. 1,
Systems, vol. 33, no. 1, pp. 64–95, 2008. pp. 32–45, 2010.
[19] W. van der Aalst, M. Schonenberg, and M. Song, “Time Pre- [38] W. van der Aalst, “Configurable Services in the Cloud: Sup-
diction Based on Process Mining,” Information Systems, vol. 36, porting Variability While Enabling Cross-Organizational Pro-
no. 2, pp. 450–475, 2011. cess Mining,” in OTM Federated Conferences, 18th International
[20] M. Song and W. van der Aalst, “Towards Comprehensive Conference on Cooperative Information Systems (CoopIS 2010), ser.
Support for Organizational Mining,” Decision Support Systems, Lecture Notes in Computer Science, R.Meersman, T. Dillon,
vol. 46, no. 1, pp. 300–317, 2008. and P. Herrero, Eds., vol. 6426. Springer-Verlag, Berlin, 2010,
[21] D. Hand, H. Mannila, and P. Smyth, Principles of Data Mining. pp. 8–25.
MIT press, Cambridge, MA, 2001. [39] M. Weske, Business Process Management: Concepts, Languages,
[22] R. Bose, W. van der Aalst, I. Zliobaite, and M. Pechenizkiy, Architectures. Springer-Verlag, Berlin, 2007.
“Handling Concept Drift in Process Mining,” in International
Conference on Advanced Information Systems Engineering (Caise
2011), ser. Lecture Notes in Computer Science, H. Mouratidis
and C. Rolland, Eds., vol. 6741. Springer-Verlag, Berlin, 2011,
pp. 391–405.
[23] W. van der Aalst, P. Barthelmess, C. Ellis, and J. Wainer,
“Proclets: A Framework for Lightweight Interacting Workflow
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON SERVICES COMPUTING, NOVEMBER 2011 13

Wil van der Aalst Prof.dr.ir. Wil van der


Aalst is a full professor of Information Sys-
tems at the Technische Universiteit Eind-
hoven (TU/e). Currently he is also an ad-
junct professor at Queensland University of
Technology (QUT) working within the BPM
group there. His research interests include
workflow management, process mining, Petri
nets, business process management, pro-
cess modeling, and process analysis. He is
also editor/member of the editorial board of
several journals, including the Distributed and Parallel Databases,
the International Journal of Business Process Integration and Man-
agement, the International Journal on Enterprise Modelling and
Information Systems Architectures, Computers in Industry, Business
& Information Systems Engineering, IEEE Transactions on Services
Computing, Lecture Notes in Business Information Processing, and
Transactions on Petri Nets and Other Models of Concurrency. He
is also a member of the Royal Holland Society of Sciences and
Humanities (Koninklijke Hollandsche Maatschappij der Wetenschap-
pen) and the Academy of Europe (Academia Europaea).

You might also like