You are on page 1of 38

Data and Process Engineering Lab

The International Collaboration Technology Research Group


Division of Computer Science and Engineering

Process Mining System


Demonstration

PhD student: Dinh-Lam Pham


Advisor: Prof. Kwanghoon Pio Kim

Acknowledgment: This research was supported by the Basic Science Research Program,
(Grant No. 2017R1A2B2010697), through the National Research Foundation of Korea (NRF) funded by the
Ministry of Education, Republic of Korea.

Vietnam, May - 2019 1


Agenda

• IEEE XES Format standard


• Process mining system framework
• Case study 1: Process mining in large bank transaction system
• Case study 2: Process mining in paper peer review system
• Case study 3: Process mining in detecting malware behavior
• Conclusion

2
IEEE eXtensible Event Stream (XES) standard

Purpose: providing a generally


acknowledged XML format for
the interchange of event data
between information systems in
many applications domains.

The UML 2.0 class diagram for the complete meta-model for the XES standard (Source: http://xes-standard.org) 3
Process Mining Framework

The Process mining framework 4


CASE STUDY 1
Case study 1: Process mining in large bank transaction system

• Provided by 4TU. Centre for research data.

• Describe the transaction process in the Bank system

• Objective: Discover the Information Control Net model from the log

The bank transaction process in BPMN

6
Case study 1: Process mining in large bank transaction system

Traces 10.000
Log summary
Events 678.864

Activities 113 7
DEMO CASE STUDY 1
Information control net (ICN) Model

• ICN is a typical workflow modeling approach supporting graphical and formal


representations.

The Information Control Net Primitive Patterns

9
Deliberate noises

A STAR
T

B C

B C

D E
D E

Potential
Deliberate Noises F Discovered
F
Deliberate Noises

G END

The deliberate noises concept

10
The ρ-Algorithm framework

A Specific Temporal
Business Process Workcase Models
Model TWCM1
Pairs of Temporally Ordered
BP Instance a1 Adjacent-Activities Group1
Event Traces AAG1 A Mined
WT1 a2 a1
Proportional Process Pattern Graph
a2 with the Proportional Counters a1
we11 a2 Quantitative Adjacent-Activity Set
a4
we12 a4 with the Proportional Counters M-P3G 0.35 0.65

we03 a1 a1 a1 a1 a1
Proportional a3 a2
Counter-a1 70 130
70 130
a1
a1
AAG2 a3 a2 a3
a3 a2 0.25 0.75
we11 a3 a2
a2 a2 98
a2 a2 a2 a2 Proportional 70 32
we12 a2 a6 a4 a5
LOGs a4
Counter-a2 32 98
a6 a4 a5
we03 a5 a4 a5 a4 a4 a5
Business Process
Enactment
Event Histories and Logs a3 a3 a3 Proportional a3 Proportional Process Patterns
in the XES Format Counter-a3 70 ² Linear (Sequential) Pattern
a1 AAGn a6 a6 a6 a6 ² Disjunctive (Selective) Pattern
we03 a1
a3 ² Conjunctive (Parallel) Pattern
we14 a3
a3 ² Iterative (Loop) Pattern D-PICN
we05
a6 A Discovered
a6
WTn Proportional Information Control Net
TWCMn

The ρ-Algorithm process pattern discovery procedure

11
The Algorithmic principle for making OPEN-gateways

A Discovered Business Process Model


with Proportional Process Patterns
D-PICNM
Quantitative Adjacent-Activity Set in BB3 SAPP ~ PBAP
SAPP ~ PBAP with Proportional Counters with Activity Occurrences
SAPP 3272 SAPP
SAPP RAP CAPR RAP 6224
SAV 3272 AND-OPEN1
1693 1579 4148 2675 2362 1693/3272
FAV 3272
1579/3272
URAP 2952 LOOP-OPEN1

RAP SAV CAPR ARAP URAP CAPR 6224


2222/2952
AAV 3272 RAP SAV
BAV 3272
URAP ARAP SAV ARAP 3272 URAP 4148/6224 AND-OPEN2
PBAP 3272 1147/3272 1088/3272
2222 1848 1147 1088
CAPR
BAV AAV
RAP PBAP BAV AAV The Principle of 2362/6224
Process Pattern Decision-
Makings (OPEN-gateway) 2675/6224
BAV AAV FAV • XOR: OccurParent > each of OccurChildren
• AND: OccurParent == all of OccurChildren ARAP FAV
951 1039 1424 • LOOP: OccurParent < OccurChild
1848/3272 1424/3272
SAPP SAPP SAPP
FAV FAV PBAP

327
2
327

2 =
1693 1579 4>

327
622

PBAP

2
RAP SAV RAP SAV SAV

RAP

The principle for making OPEN-gateways


12
The Algorithmic principle for making CLOSE-gateways

A Discovered Business Process Model


with Proportional Process Patterns
D-PICNM
Quantitative Adjacent-Activity Set in BB7 SAPP ~ PBAP
SAPP ~ PBAP with Proportional Counters with Activity Occurrences
SAPP 3272 SAPP:3272
SAPP RAP CAPR
RAP 6224
1693/3272
1693 1579 4148 2675 2362 SAV 3272 AND-OPEN1
FAV 3272 1579/3272
LOOP-OPEN1
URAP 2952
RAP SAV CAPR ARAP URAP
CAPR 6224 2222/2952
AAV 3272 RAP:6224 SAV:3272
BAV 3272
URAP ARAP SAV URAP:2952 4148/6224
ARAP 3272 AND-OPEN2

2222 1848 1147 1088 PBAP 3272 1147/3272 1088/3272


CAPR:6224
RAP PBAP BAV AAV BAV:3272 AAV:3272
2362/6224 LOOP-CLOSE1
The Principle of AND-CLOSE2
Process Pattern Decision- 2675/6224
BAV AAV FAV Makings (CLOSE-gateway)
951 1039 1424 • XOR: TotalParent < each of TotalChildren ARAP:3272 FAV:3272
• AND: TotalParent == all of TotalChildren
• LOOP: TotalParent > TotalChild 1848/3272
FAV FAV PBAP 1424/3272
AND-CLOSE1

ARAP FAV ARAP FAV ARAP FAV

3272
3272

3272 3272 PBAP:3272

==
= 327

3272
2

PBAP PBAP PBAP

The principle for making CLOSE-gateways 13


18458(p:1.000) 18458(p:1.000) 18458(p:1.000) 8458(p:0.458)

AND_CLOSE

18458

FRPP

18458

LOOP_OPEN

10000(p:0.542) 8458(p:0.458)

EPP REPP

START
10000
10000
OR_OPEN
ST
4813(p:0.481) 5187(p:0.519)
10000

SHRRP SLRRP
STRR

4813 5187 10000

OR_OPEN AND_OPEN AND_OPEN

10000(p:1.000) 10000(p:1.000) 10000(p:1.000)


4813(p:1.000) 9657(p:2.006) 5187(p:1.000) 5187(p:1.000)

RRD RRR RRS


AND_OPEN LOOP_CLOSE LRIRV LRERV
10000(p:1.000) 10000(p:1.000) 10000(p:1.000)

4813(p:1.000) 4813(p:1.000) 9657 5187(p:1.000) 5187(p:1.000) AND_CLOSE

10000
HRHA HRRAN SHRRPC AND_CLOSE

FTRR

4813(p:1.000) 4813(p:1.000) 9657 5187


10000
AND_CLOSE OR_OPEN
SSA

4797(p:0.497) 4860(p:0.503) LRRR 10000

OR_OPEN

AHRRPC MHRRPC 5187 4995(p:0.499) 5005(p:0.500)

CSPID

4797(p:0.497) 4860(p:0.503) 4844(p:0.502) FLRRP


4995 CSBID

OR_CLOSE
OR_OPEN 5005

4813(p:1.000) 9657
2482(p:0.497) 2513(p:0.503) ROC

FHRRPC

RBID SCUC
9657
2482 2513

LOOP_OPEN
GBID AND_OPEN

4813(p:0.498) 4844(p:0.502) 5187(p:0.519)


2482 2513(p:1.000) 2513(p:1.000) 2513(p:1.000)

OR_CLOSE RHRRPC AND_OPEN HCSH NSA HCSP

4813 2482(p:1.000) 2482(p:1.000) 2513(p:1.000) 2513(p:1.000) 2513(p:1.000)

LCSP LCSH AND_CLOSE


HRRR

2482(p:1.000) 2482(p:1.000) 2513 5005(p:0.500)

4813
AND_CLOSE

FHRRP
2482 FCUC

4813(p:0.481)
ABID 2513(p:0.503)

OR_CLOSE
2482(p:0.497)
10000
OR_CLOSE

4995
FRP

RNC
10000

4995(p:0.499)

STT
OR_CLOSE

10000
10000

FSA
OR_OPEN

10000
5065(p:0.506) 4935(p:0.493)

SPV
SIT SET
10000

5065(p:0.538) 4935(p:0.534)
OR_OPEN

LOOP_CLOSE LOOP_CLOSE 3402(p:0.340) 3326(p:0.333) 3272(p:0.327)

SCHPP SCPP SAPP


9417 9246

3402 3326 3272


CIT CDD

OR_OPEN CC OR_OPEN

9417 4352(p:0.462) 9246 4311(p:0.466)


1686(p:0.496) 1716(p:0.504) 3326 3272(p:1.000) 3272(p:0.526)

LOOP_OPEN LOOP_OPEN
ACA MCA AC SAV LOOP_CLOSE

5065(p:0.538) 4352(p:0.462) 4935(p:0.534) 4311(p:0.466)


1686(p:0.496) 1716(p:0.504) 3326 3272 6224

RGIT RIT ET DND OR_CLOSE AND_OPEN AND_OPEN RAP

3402 3326(p:1.000) 3326(p:1.000) 3326(p:1.000) 3272(p:1.000) 3272(p:1.000) 4148


5065 4935(p:0.542)

SCHC CBSN CIBSN CASN AAV BAV CAPR 2952(p:0.474)


LOOP_CLOSE

3402 3326(p:1.000) 3326(p:1.000) 3326(p:1.000) 3272(p:1.000) 3272(p:1.000) 6224

FIT 9112 AND_OPEN AND_CLOSE AND_CLOSE LOOP_OPEN

3402(p:1.000) 3402(p:1.000) 3402(p:1.000) 3326 3272 3272(p:0.526) 2952(p:0.474)

CET
CIBCHSN CBCHSN CACHSN CPC FAV ARAP URAP

4177(p:0.458) 9112 3402(p:1.000) 3402(p:1.000) 3402(p:1.000) 3326 3272(p:1.000) 3272(p:1.000)

AND_CLOSE AND_OPEN AND_CLOSE


LOOP_OPEN
3402 3326(p:1.000) 3326(p:1.000) 3272
5065(p:0.506) 4177(p:0.458) 4935(p:0.542)
FCHC ECRR CCRA PBAP

RET RGET
3402 3326(p:1.000) 3326(p:1.000) 3272

4935 PCH AND_CLOSE AND_OPEN

3402 3326 3272(p:1.000) 3272(p:1.000)


FET

AND_OPEN ECPR CAPA


4935(p:0.493)

OR_CLOSE 3402(p:1.000) 3402(p:1.000) 3402(p:1.000) RCCP 3272(p:1.000) 3272(p:1.000)

10000
RCH ECHRR CCHRA 3326 AND_CLOSE

FTT

3402(p:1.000) 3402(p:1.000) 3402(p:1.000) FCPP 3272

10000

AND_CLOSE RCAP
SC
ICN Model discovered from the large bank transaction log

3402 3326(p:0.333) 3272

10000
FCHPP FAPP

AND_OPEN
Case study 1: Process mining in large bank transaction system

3402(p:0.340) 3272(p:0.327)

10000(p:1.000) 10000(p:1.000) 10000(p:1.000)


OR_CLOSE

10000
RTC SRCP

FPP

NATC 10000 10000 10000

SRP
OR_OPEN OR_OPEN
10000(p:0.542)
4974(p:0.497) 4954(p:0.495) 5046(p:0.505)
LOOP_CLOSE

RATC 10000(p:1.000) ER PR
18458

4974(p:0.497) 3341 3431 SRPP

18458
10000(p:1.000)OR_CLOSE ESR GPR
AND_OPEN

4954(p:0.495) 5046(p:0.505) 18458(p:1.000) 18458(p:1.000) 18458(p:1.000)

OR_CLOSE REPC RIBPC RBPC

10000(p:1.000) 10000 18458(p:1.000) 18458(p:1.000) 18458(p:1.000) 8458(p:0.458)

AND_CLOSE
FRCP
18458

10000(p:1.000)
FRPP

AND_CLOSE
18458

10000
LOOP_OPEN

FC 10000(p:0.542) 8458(p:0.458)

EPP REPP
10000
14

10000

FT
OR_OPEN

4813(p:0.481) 5187(p:0.519)
10000

SHRRP SLRRP
END
4813 5187

OR_OPEN AND_OPEN

4813(p:1.000) 9657(p:2.006) 5187(p:1.000) 5187(p:1.000)

AND_OPEN LOOP_CLOSE LRIRV LRERV

4813(p:1.000) 4813(p:1.000) 9657 5187(p:1.000) 5187(p:1.000)

HRHA HRRAN SHRRPC AND_CLOSE

4813(p:1.000) 4813(p:1.000) 9657 5187

AND_CLOSE OR_OPEN
CASE STUDY 2
Case study 2: Process mining in paper peer review system

• Provided by 4TU. Centre for research data.


• It describes the paper peer review process of a Journal.
• Objective: Discover the work transference network from the log.

Invite reviewers

Collect Accept/
Decide
reviews Reject

Invite additional
reviewers

The paper peer review process in BPMN 16


Case study 2: Process mining in paper peer review system

The log summary

Traces 10.000 Activities Performers


invite reviewers Pete
Events 236.360
get review 1 John
Activities 14 get review 2 Carol
Performers 11 get review 3 Mike
time-out 1 Sara
time-out 2 Anne
time-out 3 Wil
collect reviews Pam
decide Sam
invite additional Mary
reviewer __INVALID__
get review X (Unknow name)
time-out X
accept
reject
17
Work Transference Network

Work Transference network is


formed by combining the Temporal
workcase of each trace from the log
file of the information system.

18
Work Transference Network - Example

Start P1 P3 P4 P10 P4 P4 P11 P7 P5 End

(b) A WTN of Build time from the Library Book Acquisition


Workflow Model 19
(a) Library Book Acquisition ICN Model
Deliberate noises in Work Transference Network

20
Deliberate noises in WTN
DEMO CASE STUDY 2
The Rediscovered
Work Transference Network Model
Before Discovering and Excluding Deliberate Noises
The Rediscovered
Information Control Net Process Model
LOOP_CLOSE START
START
The Rediscovered
Work Transference Network Model
OR_OPEN LOOP_OPEN

time_out 1 time_out 2 time_out 3 get review 3 get review 1 get review 2 invite reviewers
Mike
After Discovering and Excluding Deliberate Noises
OR_CLOSE Mary

LOOP_OPEN

Anne
collect reviews

LOOP_CLOSE Pam Sam John Carol Pete END Wil __INVALID__ Sara

LOOP_OPEN

decide

LOOP_CLOSE

OR_OPEN

LOOP_CLOSE

LOOP_OPEN

invite additional reviewer

LOOP_CLOSE

LOOP_OPEN

get review X time_out X

OR_OPEN OR_OPEN

OR_CLOSE OR_CLOSE

LOOP_OPEN LOOP_OPEN

The Complete
accept reject
Work Transference Network Model
LOOP_CLOSE
LOOP_CLOSE

OR_CLOSE

END
CASE STUDY 3
Case study 3: Process mining in detecting malware behavior

• Malware refers to malicious software infect individual computers, mobiles or an entire


organization’s network.
• Malware detection identify and protect against harm from viruses, worms, Trojan
horses, spyware, and other forms of malicious code.
• Malware detection and prevention technologies are widely available for servers,
gateways, user workstations, and mobile devices.

Malware illustration (Source: cisco.com)

25
Case study 3: Process mining in detecting malware behavior

• Malware may be hidden from system task manager but


all run processes were recored in event log of system. IDEA
• Detect abnormal behavior of processes
from event logs of computer.

26
Case study 3: Process mining in detecting malware behavior

• Transfer (reformat) system log into eXtensible Event Stream (XES) format.
• From XES Log reformatted, build the graph of process then analysis process behavior

Malware behavior detection from system event logs architecture 27


Case study 3: Process mining in detecting malware behavior

• Windows event log

Windows event log types

28
Case study 3: Process mining in detecting malware behavior

• From *.evtx to Event Log

From *.evtx to Event Log (Source: Troy Larson – Microsoft TwC)


29
Case study 3: Process mining in detecting malware behavior

• Windows event log XML structure

The Windows event log XML structure 30


Case study 3: Process mining in detecting malware behavior

31
DEMO CASE STUDY 3
Case study 3: Process mining in detecting malware behavior

Iservc.exe (Fizzer worm) behavior caught from process mining system 33


Case study 3: Process mining in detecting malware behavior

34
Case study 3: Process mining in detecting malware behavior

35
VirusTotal shows the Virus information
Conclusion
Conclusion

• Process mining helps us to understand what actual happened in our information


systems.
• We can apply process mining in various domain (i.e. eGovernment, logistics,
transportation, etc)
• By using process mining techniques, We can improve process efficiency, reduce costs,
increase system performance.

37
PhD student: Dinh-Lam Pham

You might also like