You are on page 1of 20

SOFTWARE PROCESS MINING

DR. VLADIMIR RUBIN


LEAD IT ARCHITECT & CONSULTANT @ DR. RUBIN IT CONSULTING
LEAD RESEARCH FELLOW @ PAIS LAB / HSE

ANNOTATION
Nowadays, in the era of social, mobile and cloud computing, different
business information systems produce, log and trace regularly terra bytes of
data. Process mining deals with transforming this data to a valuable
knowledge, which is used for improving the business processes.
However, process mining can also be successfully applied to the area of
development of information systems. It can be used for deriving the model
of a software development process. Mining the end-user behavior can help
improving the functionality and the usability of software. And mining the
software system at runtime is beneficial for improving the software
architecture and performance.
Here, we introduce software process mining:

1. mining the software development process


2. mining the software end-user behavior
3. mining the software runtime behavior

29.01.2014

Slide 2

VLADIMIR RUBIN
Lead IT Architekt and Consultant
Collaboration with msg systems AG

Founder of Dr. Rubin IT Consulting, Frankfurt/Germany


Lead Research Fellow at PAIS Lab (Higher School of Economics, Moscow)
3 Years msg systems ag, Frankfurt, Munich/Germany
3 Years Capgemini, Frankfurt/Germany, Bern/Switzerland
3 Years Netcracker Technologies Corp, Boston/USA
3 Years PhD in Computer Science
University of Paderborn/Germany , Eindhoven University of Technology/Holland

5 Years M.Sc. in Computer Science at Moscow State University of Railway Transport


Points of interest:
Big Enterprise Projects (Java EE) and Methodical SW-Development (Agile, SOA)
Business Process Modeling (BPM) and Process Mining
Model-driven Software Development (MDD)
29.01.2014

Slide 3

29.01.2014

Slide 4

MODERN SOFTWARE PROJECTS

How the customer


explained it

How the project


was documented

How the analyst


designed it

How the customer


was billed

How the programmer


wrote it

What the customer


really needed

* http://www.projectcartoon.com

HOW
PROCESS MINING
HELPS DEALING WITH
SOFTWARE ENGINEERING
CHALLENGES?

29.01.2014

Slide 5

29.01.2014

Slide 6

ONCE PROCESS MINER, ALWAYS PROCESS MINER

AGENDA

1. Software Process Mining

2. Software Process Mining

29.01.2014

Slide 7

MOTIVATION: QUALITY
Idea
Software Process Quality
CMM (CMMI)

Company

Process
Model

~50%
of companies

Product Quality

Automatic support for

deriving software
development
processes

Process
Engineer

Practitioners are not involved


Existing processes are not analysed
Manual way of work: expensive, error-prone...
Models have discrepancies with the reality
29.01.2014

Slide 8

MOTIVATION: SOFTWARE DEVELOPMENT PROCESS

29.01.2014

Slide 9

HYPOTHESIS
Document Logs from
Software Repositories
can be used for
discovering
Process Models

Mining
Approach

29.01.2014

Slide 10

MINING APPROACH: PREPROCESSING


1. Preprocessing

Example: SCM Commits (CVS, Subversion, ClearCase, ...)


DES

designer

CODE

developer

TEST

qaengineer

REV

manager

DES

designer

TEST

qaengineer

CODE

developer

REV

designer

DES

designer

VER

qaengineer

CODE

designer

REV

manager

Revision 569362 - (view) (download) (as text) (annotate) [select for diffs]
Modified Fri Aug 24 12:09:09 2007 UTC (6 weeks, 1 day ago)
by bayard
Revision 567258 - (view) (download) (as text) (annotate) [select for diffs]
Modified Sat Aug 18 11:14:52 2007 UTC (7 weeks, 1 day ago)
by tetsuya

SVN log

Different Projects (Plugins)


Different Releases

Other Examples:
Bug Tracking (Bugzilla, ...)
Issue Tracking (Jira, ...)

...

MINING APPROACH: CONTROL-FLOW MINING ALGORITHM


1. Preprocessing

2. Process Mining

a) Transition System Generation


DES

designer

CODE

developer

TEST

qaengineer

REV

manager

DES

designer

TEST

qaengineer

CODE

developer

REV

designer

DES

designer

VER

qaengineer

CODE

designer

REV

manager

b) Petri Net Synthesis

Constructing TS
Modification Strategies for TS

Properties:
flexible, supports generalization
deals with complex constructs
generates consistent models
apply theory of regions: synthesis
algorithms of Cortadella et al.

MINING APPROACH: OTHER PERSPECTIVES


1. Preprocessing

2. Process Mining

Performance Perspective

3. Model Analysis

Conformance Checking and Views

0.67
TEST

0.33
DES

VER

REV

0.25
CODE

0.75

Organizational Perspective

Verification (LTL)

0.111
0.111
designer

0.111

0.111
developer

qaengineer

0.111

Always when CODE


then eventually TEST

DES

designer

CODE

developer

TEST

qaengineer

REV

manager

0.222
0.111

0.111

apply different algorithms


developed in the IS Group (TU/e)

manager

IMPLEMENTATION
1. Preprocessing

2. Process Mining

ProM Import
Framework

3. Model Analysis

ProM

Implemented ProM Plugins:


Transition System Generator
Export2Petrify (Petrify PN Synthesis)
Import from Petrify
+ Remap Filter (together with C. Gnther)
(based on Prolog research prototype)

In cooperation with the IS group


(Eindhoven University of Technology)
29.01.2014

Slide 14

EVALUATION
Case Studies:

Main Results:

Softwaretechnikpraktikum
(SCM system CVS and Subversion)
FG Softwaretechnik,
University of Paderborn

Discovered plausible process models


corresponding to the given specifications

Open-source Software Project ArgoUML


(SCM system Subversion)

Indentified the discrepancies between the


specified and the discovered processes
Analysed the performance and
identified the critical tasks
Discovered organizational models and the
social networks

Open Development Platform Eclipse


(Bug Repositories Bugzilla)

Verified the models against important


properties

29.01.2014

Slide 15

CONTRIBUTIONS

A Worklfow Mining
Approach for Deriving
Software Process Models

Software Process Mining


(Research Areas)

mining different perspectives


incremental mode

Tool Support

Theory of
Regions

configurable
consistent

Evaluation
Sources of Experimental Data

29.01.2014

Slide 16

AGENDA

1. Software Process Mining

2. Software Process Mining

29.01.2014

Slide 17

29.01.2014

Slide 18

MINING THE USER BEHAVIOUR

MINING THE USER BEHAVIOUR: USE CASES


Mining user actity traces can be used for:

Understanding the real behaviour of the user


Improving the GUI
Implementing Quick Wins
Redesigning the software system
Changing the design according to the
real world scenarios

Developing the acceptance tests


Capture and replay
Monitoring the system usage
(APM application performance monitoring)

Visualizing the state, Failure Alerts


29.01.2014

Slide 19

29.01.2014

Slide 20

EXAMPLE: TOURISTIC BOOKING SYSTEM

TOMA MASK AND MESSAGE

087624 60T1009006001001T001001002000D D BA 1024DER


PHXU25307V5023EUR503801SP HAM BGO 3A ST ZHI 2 1 0107135
2501802KV599959995431

1-

29.01.2014

Slide 21

MINING: INPUT
~ 30 MB Logs per Day per Environment (PROD, TEST, INT, DEV)
Logs are preprocessed and converted to CSV (30 KB per Day)

Input for Disco

29.01.2014

Slide 22

MODEL FOR ONE SET OF TESTS FOR ONE DAY

95 cases
482 events
50 activities
Mean duration: 6.5 min; Median duration: 26.5 s

29.01.2014

Slide 23

FOCUS ON SUCCESSES: FREQUENCY


64 cases
(67% of all cases)
228 events
39 activities

Frequent activities:
Hotel Quote
Hotel Book
Flight Search
Show Reservation

29.01.2014

Slide 24

FOCUS ON SUCCESSES: PERFORMANCE


Median duration

29.01.2014

Slide 25

FOCUS ON FAILURES
Problems with:
Hotel Search
Hotel Quote
Show Reservation

29.01.2014

Slide 26

SOME STATISTICS
Most frequent travelling directions:

Most active users:

29.01.2014

Slide 27

WHAT WE HAVE LEARNED

1. We could monitor the acceptance tests of the users (online)


2. We could visualize the user behaviour and discuss it with the
end user. Communication!!!

3. The management could easily see current successes and


failures.

4. We aligned the failure cases with the exceptions and created the
issues for further bug fixing.

5. We idetified the most critical parts of the software and focused


firstly on them (Pareto principle)

29.01.2014

Slide 28

MINING THE SOFTWARE RUNTIME BEHAVIOUR

29.01.2014

Slide 29

MINING THE SOFTWARE RUNTIME BEHAVIOUR: USE CASES


Mining software runtime traces:

Understanding the performance


Localizing the bottlenecks
Understanding the architectural deficiencies
Improving the architecture
Aligning the exception traces with user
behaviour

29.01.2014

Slide 30

EXAMPLE: TOURISTIC BOOKING SYSTEM

29.01.2014

Slide 31

MINING: INPUT
~ 5 GB of Traces per Day per Environment (PROD, TEST, INT, DEV)
Logs are preprocessed and converted to CSV (20 MB per Day) Input for Disco

29.01.2014

Slide 32

MODEL FOR ONE DAY FOR ONE BUSINESS DOMAIN

758 cases
Computation of the whole graph takes
61844 events
more then 30 minutes
508 activities
Mean duration: 5 sec; Median duration: 30 millis

29.01.2014

Slide 33

FOCUS ON ONE BUSINESS DOMAIN: FREQUENCY


1. Process calls
2. Subsequent service
calls

29.01.2014

Slide 34

FOCUS ON ONE BUSINESS DOMAIN: PERFORMANCE


Total duration of calls

29.01.2014

Slide 35

29.01.2014

Slide 36

SOME STATISTICS
Payloads:

Frequency of activities:

WHAT WE HAVE LEARNED

1. We could visualize the system runtime behaviour.


2. We could discuss (drill down, roll up) particular behaviour with
technical designers

3. We identified the performance bottlenecks.


4. We identified the most critical processes and services from the
architectural point of view

5. We improved the performance in many cases using caching or


refactoring...

29.01.2014

Slide 37

29.01.2014

Slide 38

OVERVIEW

FUTURE WORK: RESEARCH DIRECTIONS ...

1. Process Mining methods for Software Process Mining


Filtering (OLAP-similar operations)
Dealing with Gigabytes of logs
2. Mining different perspectives
Data perspective (Requests, Responses, Payloads)
3. Integrating Process Mining in Software Development Process
Agile approaches (Early Feedback using Process Mining)
4. Monitoring and Process Mining Online
Aligning mined models with logs
Continuous repair of models
5. Prediction of user behaviour, guiding the user
29.01.2014

LETS PROCEED WITH SOFTWARE PROCESS MINING !!!

Slide 39