You are on page 1of 12

Process Mining and Petri Net Synthesis

Ekkart Kindler, Vladimir Rubin, and Wilhelm Sch

Software Engineering Group, University of Paderborn,
Warburger Str. 100, D-33098 Paderborn, Germany
{kindler, vroubine, wilhelm}
Abstract. The theory of regions and the algorithms for synthesizing a
Petri net model from a transition system, which are based on this theory,
have interesting practical applications in particular in the design of
electronic circuits. In this paper, we show that this theory can be also
applied for mining the underlying process from the user interactions with
a document management system. To this end, we combine an algorithm
that we called activity mining with such Petri net synthesis algorithms.
We present the basic idea of this approach, show some rst results, and
compare them with classical process mining techniques. The main benet
is that, in combination, the activity mining algorithm and the synthesis
algorithms do not need a log of the activities, which is not available when
the processes are supported by a document management system only.


Today, there is a bunch of techniques that help to automatically come up with

process models from a sequence of activities that are executed in an enterprise
[1]. Typically, such sequences come from the log of a workow management
system or some standard software which is used for executing these processes.
There are many dierent algorithms and methods that help to obtain faithful
process models; some techniques come up with an initial model quite fast and
the process models are incrementally improved by new observations [2]. All these
techniques can be summarized by the term process mining.
Our interest in process mining came from the area of software engineering.
Software engineering processes are often not well-documented, though good engineers have the processes in their minds. In the Capability Maturity Model
(CMM), this level of maturity of a software company is called repeatable [3].
Therefore, we looked for methods for automatically mining these process models from the observed work. The main source for observing the work of software
engineers are the logs of the version management systems and document management systems that are used in the development process. The problem, however,
is that these systems are aware of documents only and not of the underlying activities. Basically, they see the creation, modication, and checkin of documents,
but they are not aware of the activities and to which activity these events belong
to. Therefore, the standard mining algorithms do not work; we must identify the
activities from the event logs of the document management systems before: we
call this activity mining. By activity mining, we get more information on the
J. Eder, S. Dustdar et al. (Eds.): BPM 2006 Workshops, LNCS 4103, pp. 105116, 2006.
c Springer-Verlag Berlin Heidelberg 2006


E. Kindler, V. Rubin, and W. Sch


process than just a sequence of activities. In order to exploit this information,

we developed an algorithm for obtaining the process models [4].
Having a closer look to the results of activity mining algorithms revealed that
we could easily obtain a transition system for the underlying processes, where
the transitions are the activities of the processes. So, basically, deriving a process
model from the result of the activity mining algorithm means deriving a Petri net
from a transition system, which is a well-known area of Petri net theory called
Petri net synthesis. It was established by the seminal paper by Ehrenfeucht and
Rozenberg [5] on regions and later extended and elaborated by other authors
[6,7,8]. In this paper, we show that our activity mining algorithm in combination
with the tool Petrify [9] can be used for faithfully mining process models from
logs of document management systems and version management systems. The
focus of this paper is on the use of synthesis algorithm; for details on the activity
mining algorithms, we refer to [4].

Related Work

There is much research in the area of process mining [1]. People from dierent
research domains, such as software process engineering, software conguration
management, workow management, and data mining are interested in deriving
the behavioural models from the audit trails of the standard software.
The rst application of process mining to the workow domain was presented by Agrawal et al. in 1998 [10]. The approach of Herbst and Karagiannis [11] uses machine learning techniques for acquisition and adaptation of
workow models. The seminal work in the area of process mining was presented
by van der Aalst et al. [12,13]. In this work, the causality relations between activities in logs are introduced and the -mining algorithm for discovering workow
models is dened. The research in the area of software process mining started
in the mid 90ties with new approaches to the grammar inference problem proposed by Cook and Wolf [14]. The other work from the software domain is in
the area of mining from software repositories [15]. Our approach [4] aims at
combining software process mining with mining from software repositories; it
derives a software process from the logs of software conguration management
Another research area, which is discussed in this paper, is the area of Petri
net synthesis and the theory of regions. The seminal paper in this area was
written by Ehrenfeucht and Rozenberg [5]. It answered a long open question
in Petri net theory: how to obtain a Petri net model from a transition system.
Further research in this area came up with synthesis algorithms for elementary
net systems [7] and even proved some promising complexity results for bounded
place/transition systems [6].
First ideas of combining process mining and process synthesis were already
mentioned in the process mining domain [13,16]. In this paper, we make the next
step, we present an algorithm that enables us using the Petri net synthesis tool
Petrify [9] for process mining.

Process Mining and Petri Net Synthesis


Fig. 1. Mining and Synthesis Schema

Mining and Synthesis

In this section, we present the overall approach; it combines our mining algorithms with Petri net synthesis algorithms in order to discover process models
from versioning logs of document management systems.
The overall scheme of this approach is presented in Fig. 1. It starts with a versioning log as an input; by means of our activity mining algorithm, we derive a
set of activities from the log. Using the set of activities, we do transition system
generation. From the transition system, we derive a Petri net with the help of the
synthesis algorithm. In this paper, we briey discuss our activity mining algorithm;
however, the focus of this paper is on the transition system generation, the use of
the synthesis algorithm and the process models that can be obtained by it.

Transition System Generation from Versioning Logs

In this section, we deal with the versioning logs and present the transition system
generation algorithm.
Initial Input and Activity Mining. Here, we briey discuss our activity
mining algorithm and the structure of the input it needs. This input information
is versioning logs of dierent document management systems, such as Software
Conguration Management (SCM) systems, Product Data Management (PDM)
systems and other conguration and version management systems.
An example of a versioning log is shown in Table 1. The log contains data
on the documents and timestamps of their commits to the system along with
data on users and log comments. The versioning log consists of execution logs
(in our example, they are separated by double lines), the structure of which
can be derived using additional information, not discussed in this paper. These
execution logs contain information about the instances of the process. Our small
example was inspired by the software change process [17]; for this process, there
are dierent executions, in which dierent documents are committed in dierent
order starting with the design and nishing with the review. We group
execution logs into clusters. A cluster is a set of execution logs, which contains
identical sets of documents. For example, the rst two execution logs make up
a cluster, because they both contain design, code, testPlan and review
documents; the third execution log forms another cluster.
From the information about the execution logs and their clusters, the documents and the order of their commits to the system, we derive a set of activities


E. Kindler, V. Rubin, and W. Sch

Table 1. Versioning Log




status: initial
status: generated
status: initial
status: pending
status: initial
status: initial
status: generated
status: pending
status: initial
status: initial
status: generated
status: pending

with the help of the activity mining algorithm (for details, see [4]). The resulting
set is shown in Table 2. Since we have only information about the documents,
we adopt a document-oriented view on the activities: they are dened by the
input and the output documents1 . The output documents are derived from the
logs straightforwardly; the challenge of activity mining is deriving the inputs,
because this information is not represented explicitly. The input contains all the
documents that precede the output document in all the execution logs. For each
activity, we have also shown the clusters from which it was derived; i.e. 1
means the cluster with the rst two execution logs, 2 is the cluster with the
third one. For example, activity 1 has s0 as input, design as output and can be
derived from clusters 1 or 2.
In general, let us assume, there are n clusters and each cluster is given a
unique identier from the set C = {1, . . . , n}. For every subset cl C, there is a
of sets of documents that belong to each
set Dcl , which contains the intersection

execution log of this cl: Dcl = ecl De . So, each activity is a tuple (I, O, cl),
where cl is a set of clusters from which this activity was derived; I and O are the
sets of input and output documents resp. In a formal notation, a set of activities
is dened the following way:
A {(I, O, cl)|I Dcl , O Dcl , cl C}


For each tuple, we dene a . notation, which gives the concrete eld value
by its name. E.g. for activity a1 = ({s0}, {design}, {1, 2}), we have a1 .I = {s0},
a1 .O = {design} and a1 .cl = {1, 2}.

For technical reason, we include a document s0 to the input of every activity

except 0 and also add two additional activities that produce e0; it is done for
making the process start and the process end explicit.

Process Mining and Petri Net Synthesis


Table 2. Set of Activities



design, vericationResults
design, code, testPlan
design, code, vericationResults
design, code, testPlan, review
design, code, vericationResults, review e0

1, 2
1, 2

Transition System Generation. Dierent clusters, described in the previous

section, correspond to dierent sets of documents and represent an alternative
behaviour, whereas from one cluster we can derive concurrent behaviour. For
example, activities 4 and 5 in Table 2 belong to dierent clusters, their output
documents testPlan and vericationResults belong to the document sets
of dierent clusters respectively. Thus, after creating the design, there is an
alternative either to produce a testPlan or to obtain vericationResults. But
the activities 2 and 4 belong to the same cluster, thus, after the design, it is
possible both to produce code and then a testPlan or rst a testPlan and
then code, i.e. they are concurrent.
The main goal of the transition system generation algorithm is generating a
labelled transition system using a set of activities and modelling the alternatives
and the concurrency in it. The transition system consists of states, events and a
transition relation between states, which are labelled with events. In our context,
a state is a set of activities, which represents the history of the process, i.e. it
contains the activities that were executed. All the activities of the state must
occur in the same cluster. For example, the system is in a state s1 = {0, 1, 2}
when activities 0, 1 and 2 have been executed and, thus, documents s0, design and code have been produced. An event is a document produced by an
activity enabled in a state. An activity is enabled in a state if it does not belong
to the state but belongs to the same cluster as the states activities; and the set
of the documents produced by the states activities includes the input set of the
enabled activity. For example, activity 4 is enabled in state s1 , because it does
not belong to the state, but it belongs to the same cluster as activities 0, 1 and
2; and it needs the documents s0 and design as an input, these documents
are a subset of the document set produced by s1 . So, when activity 4 is executed
in the state s1 , it produces a document testPlan and the system goes to a
new state s2 = {0, 1, 2, 4}. Thus, there is a transition between states s1 and s2
and it is labelled with testPlan. The resulting transition system is shown in
Fig. 2; for better readability, the states names do not contain the activities but
the names of the produced documents, e.g. s1 is called s s0 design and s2
s design s0 testPlan respectively.


E. Kindler, V. Rubin, and W. Sch



















Fig. 2. Generated Transition System

Formally, a transition system is a tuple T S = (S, E, T, s0 ), where S is a set of

states, E is a set of events, T S E S is transition relation and s0 S is an
initial state. In our case, a state s S is a subset of activities, i.e. s A, where
A is dened in (1). The initial state s0 = {({}, {s0}, C)} contains the activity,
which produces s0 and belongs to all clusters. There is a transition s s

between two states, if there is an activity a A such that (a = s \ s) (a.O = e)
and for all b 
s :, i.e. it belongs to the same cluster as the activities
in s and a.I bs b.O, i.e. it is enabled in s.
We implemented these formal denitions as a set of clauses in SWI-Prolog [18].
As output, our algorithm generates a le with the transition system. This le is
accepted by a synthesis tool, see Sect. 3.2, and can be automatically visualized
as shown in Fig. 2.

Petri Net Synthesis

In this section, we describe the last step of our mining and synthesis approach:
synthesis of a Petri Net from a mined transition system. We use the tool Petrify
[9] for it.
Petrify, given a nite transition system, synthesizes a Petri net with a reachability graph that is bisimilar to the transition system. The synthesis algorithm
is based on the theory of regions and was described in the work of Cortadella
et al. [19]. Petrify uses labelled Petri nets and, thus, supports synthesis from
arbitrary transition systems. It supports dierent methods for minimizing the
Petri nets and for improving the eciency of the synthesis algorithm. Here, we
do not go into the details of the synthesis algorithm, but give the essential idea
and motivate the relevance of it for the process mining area.

Process Mining and Petri Net Synthesis





Fig. 3. Synthesized Petri Net

A region is a set of states to which all transitions with the same labels have
the same relations: either they enter this set, or they exit this set or they do not
cross this set. For example, in the transition system in Fig. 2, the set of states
{ s code design s0 testP lan review,
s code design s0 verif icationResults review

is a region, because all transitions with a label review enter this set and all
transitions with a label e0 exit it. Petrify discovers a complete set of minimal
regions for the given transition system and then removes the redundant ones. A
region corresponds to a place in the synthesized Petri Net; so, Petrify tries to
minimize the number of places and to make the Petri net understandable. For
example, the synthesized Petri net is shown in Fig. 3. A place between Petri
net transitions review and e0 corresponds to the set of states, shown above.
In the transition system, dierent transitions correspond to the same event. An
event in the transition system corresponds to a Petri net transition. For example,
for the event review there is a transition with the identical name. There is
an arc between a transition and a place in the Petri net, if the corresponding
transition in the transition system enters or exits the corresponding region.
In the context of process mining, the generated Petri net represents the control aspect of the process and models concurrency and alternatives, which were
initially hidden in the logs. The transitions represent the activities. Since we
have a document-oriented view on the activities, the execution of every activity
results in committing a document to the document management system. By now,
activities are named by the names of the committed documents, for example,
activity code results in committing the document code to the system.
Since Petrify supports label splitting, it allows us to synthesize Petri nets
under dierent optimization criteria and belonging to dierent classes, such as
pure, free-choice, etc. Practically, for big projects, for complex Petri nets, we can
generate pure or free-choice versions of them, which can be better understandable
by managers and process engineers and, therefore, serve communication purposes
in the company. For example, for the Petri net shown in Fig. 3, we can generate
a pure analog of it, see Fig. 4.

Other Applications Activity Logs

Along with applying our algorithms to the area of process mining from the
versioning logs, we have also dealt with the activity logs as a standard input for


E. Kindler, V. Rubin, and W. Sch







Fig. 4. Synthesized Pure Petri Net

Table 3. Activity Log
Execution 1 Execution 2 Execution 3

the most of classical mining approaches [13,14]. These logs are usually obtained
from the workow management systems or some standard software which is used
for executing the processes in the company. For activity logs, we have deliberately
chosen an example, which is very similar to the one given for verioning logs in the
previous part of this section; it was done to motivate the generality of the mining
and synthesis approach and to improve the readability of the paper. Actually, the
algorithms for dealing with the versioning logs and for dealing with the activity
logs are absolutely dierent and one can not be replaced by the other.
An example of the activity log (event log, as it is often called in literature)
is shown in Table 3. It consists of process executions, which represent process
instances (cases); in our example, we have three instances of the process. Every
instance contains a set of activities and an order of their execution. For example,
in the rst instance, activities are executed in the following order: doDesign,
writeCode, planTest and then doReview. We add activity s0 to the
beginning of every log and activity e0 to the end of every log to make the
process start and the process end explicit.
From the activity log, without any preprocessing steps, we can generate a
transition system. In this case, a state is again a set of activities. An event is
an activity enabled in a state. An activity is enabled in a state when there is a
process execution, where the activity is executed after the set of the activities
of the state. For example, the system is in a state s1 = {s0, doDesign}, when
activities s0 and doDesign have been executed. Since in the Execution 1, an
activity writeCode is executed after the activities of the state s1 , an event
writeCode can occur in this state. When the activity is executed, the system
comes to a state s2 = {s0, doDesign, writeCode}; so, there is a transition between the states s1 and s2 . The resulting transition system is shown in Fig. 5.
The Petrify synthesis algorithm generates a Petri net from it, see Fig. 6.

Process Mining and Petri Net Synthesis




















Fig. 5. Generated Transition System





Fig. 6. Synthesized Petri Net


In a formal notation, there is a transition s s between two states, where

s = {a1 , . . . , ai1 }, a = ai , s = {a1 , . . . , ai } and a1 , . . . , ai are activities, if and
only if there is a following execution a1 , . . . , ai1 , ai , . . ..

Implementation and Evaluation

In this section, we show the rst steps and directions for the evaluation of the
presented algorithms. For making a small ad-hoc comparison with the existing
process mining approaches, we have used ProM and the -algorithm [13] for
generating a Petri net from the log presented in Table 3. As a result, we have
got the Petri net shown in Fig. 7. The algorithms provide dierent results, but,
for example, for our small activity log, the synthesized Petri net has no deadlocks and it models all the process executions from the activity log, whereas the
model obtained with ProM reaches a deadlock situation after executing activities
doDesign and planTests and, thus, does not model the Execution 2.
This shows that our algorithm gives a better result for at least one example. But there are other benets: First, we are capable of dealing with dierent
sources of information: versioning logs and activity logs. Second, our approach is


E. Kindler, V. Rubin, and W. Sch


doD esign

writeC ode
doR eview

Fig. 7. Petri Net generated by ProM

Table 4. Execution Times
# of Executions
3 5
Average # of Documents in Execution 4 5
Execution Time (msec)
941 1157 2307 9994

exible and extensible, because improving the initial algorithms (they work with
versioning logs) for dealing with the activity logs resulted in: 1) removing clustering and activity mining parts, which are specic and necessary for versioning
logs; 2) slightly changing the transition system generation part2 . In general, the
Petri net synthesis approach assumes having complete transition system with all
possible transitions, which is not always a realistic case; but, for the versioning
logs, the activity mining algorithm has to cope with the defects of the input data
and the transition system generation algorithm remains the same.
Our algorithms were implemented in Prolog, which gives a certain exibility
of the solution and simplies the capabilities of experimenting with it and expanding it. We have made several experiments with the algorithms. For these
experiments, the logs were generated articially but they are based on our experience on real examples. The execution times of all the algorithms (mining,
transition system generation and synthesis) are shown in Table 4. The execution time depends on the number of executions (execution log) and the average
number of documents in the execution. The columns in the table correspond to
the experiments; the time needed for constructing a Petri net from 10 logs with
10 documents in each log is less then 10 seconds, which is a rather promising
result, since this is an example in the size of a realistic log.
In this section, we have presented the rst steps towards combining the mining and the synthesis approaches for discovering process models from both versioning logs and activity logs. Though, the approach is not fully worked out and
evaluated yet, we can already see its benets even for the given simple examples.

Conclusion and Future Work

In this paper, we have presented mining and synthesis algorithms, which derive
a Petri net model of a business process from a versioning log of a document

Now, the ProM community has done their own implementation of some regions
algorithms, which is available as a Region miner plugin for ProM.

Process Mining and Petri Net Synthesis


management system. This way, we have opened a new application area for mining
without activity logs. We have also shown an extension of our approach, which
can deal with activity logs of workow management systems. The approach uses
the well-developed and practically-applicable theory of Petri net synthesis for
solving a vital problem of process mining. In order to do it, we have developed
a transition system generation algorithm, which is the main focus of the paper.
The algorithms which were presented in this paper can deal with concurrency
and alternatives in the process models. By now, we are not dealing with iterations. Detecting iterations in the versioning logs is a very important domainspecic and company-specic problem. We will deal with this problem in our
future research, even though this problem appears rather seldom, if the conventions of using the document management system are introduced and fullled in
the company. Another relevant domain-specic problem is identifying the activities and naming them meaningfully. Both issues belong to the part on activity
mining. In the future, we will improve the activity mining algorithm and, possibly, use the interaction with the user for solving these problems. However,
activity mining is not the focus of this paper; as soon as it is improved, the
transition system generation algorithm has only to be slightly changed for introducing iterations and activities identiers to the transition systems.
Much work has to be done in applying the mining and synthesis algorithms
to dierent document management systems in dierent application areas and
making practical evaluation of them both in the area of business process management and software process engineering. Since our approach is also relevant
to the area of mining the activity logs, in the future, we should also compare it
to the existing approaches in this area. This paper aims at making the rst step
from the well-developed theory of Petri net synthesis to the practically relevant
research domain of process mining.

1. van der Aalst, W., van Dongena, B.F., Herbst, J., Marustera, L., Schimm, G.,
Weijters, A.J.M.M.: Workow mining: A survey of issues and approaches. Data &
Knowledge Engineering 47 (2003) 237267
2. Kindler, E., Rubin, V., Sch
afer, W.: Incremental Workow mining based on Document Versioning Information. In Li, M., Boehm, B., Osterweil, L.J., eds.: Proc.
of the Software Process Workshop 2005, Beijing, China. Volume 3840 of LNCS.,
Springer (2005) 287301
3. Humphrey, W.S.: Managing the software process. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1989)
4. Kindler, E., Rubin, V., Sch
afer, W.: Activity mining for discovering software process models. In Biel, B., Book, M., Gruhn, V., eds.: Proc. of the Software Engineering 2006 Conference, Leipzig, Germany. Volume P-79 of LNI., Gesellschaft f
Informatik (2006) 175180
5. Ehrenfeucht, A., Rozenberg, G.: Partial (Set) 2-Structures. Part I: Basic Notions
and the Representation Problem. Acta Informatica 27 (1989) 315342
6. Badouel, E., Bernardinello, L., Darondeau, P.: Polynomial algorithms for the synthesis of bounded nets. In: TAPSOFT. (1995) 364378


E. Kindler, V. Rubin, and W. Sch


7. Desel, J., Reisig, W.: The synthesis problem of Petri nets. Acta Inf. 33 (1996)
8. Badouel, E., Darondeau, P.: Theory of regions. In: Lectures on Petri Nets I: Basic
Models, Advances in Petri Nets, the volumes are based on the Advanced Course
on Petri Nets, London, UK, Springer-Verlag (1998) 529586
9. Cortadella, J., Kishinevsky, M., Kondratyev, A., Lavagno, L., Yakovlev, A.: Petrify:
a tool for manipulating concurrent specications and synthesis of asynchronous
controllers. IEICE Transactions on Information and Systems E80-D (1997)
10. Agrawal, R., Gunopulos, D., Leymann, F.: Mining Process Models from Workow
Logs. In: Proceedings of the 6th International Conference on Extending Database
Technology, Springer-Verlag (1998) 469483
11. Herbst, J., Karagiannis, D.: An Inductive approach to the Acquisition and Adaptation of Workow Models. (1999)
12. Weijters, A., van der Aalst, W.: Workow Mining: Discovering Workow Models
from Event-Based Data. In Dousson, C., H
oppner, F., Quiniou, R., eds.: Proceedings of the ECAI Workshop on Knowledge Discovery and Spatial Data. (2002)
13. van der Aalst, W., Weijters, T., Maruster, L.: Workow mining: Discovering process
models from event logs. IEEE Transactions on Knowledge and Data Engineering
16 (2004) 11281142
14. Cook, J.E., Wolf, A.L.: Discovering Models of Software Processes from Event-Based
Data. ACM Trans. Softw. Eng. Methodol. 7 (1998) 215249
15. MSR 2005 International Workshop on Mining Software Repositories. In: ICSE 05:
Proceedings of the 27th international conference on Software engineering, New
York, NY, USA, ACM Press (2005)
16. Herbst, J.: Ein induktiver Ansatz zur Akquisition und Adaption von WorkowModellen. PhD thesis, Universit
at Ulm (2001)
17. Kellner, M.I., Felier, P.H., Finkelstein, A., Katayama, T., Osterweil, L., Penedo,
M., Rombach, H.: ISPW-6 Software Process Example. In: Proceedings of the First
International Conference on the Software Process, Redondo Beach, CA, USA, IEEE
Computer Society Press (1991) 176186
18. Wielemaker, J.: An overview of the SWI-Prolog programming environment. In
Mesnard, F., Serebenik, A., eds.: Proceedings of the 13th International Workshop
on Logic Programming Environments, Heverlee, Belgium, Katholieke Universiteit
Leuven (2003) 116 CW 371.
19. Cortadella, J., Kishinevsky, M., Lavagno, L., Yakovlev, A.: Deriving Petri nets from
nite transition systems. IEEE Transactions on Computers 47 (1998) 859882