You are on page 1of 14

Big Data Research 25 (2021) 100210

Contents lists available at ScienceDirect

Big Data Research


www.elsevier.com/locate/bdr

Using Big Data to Improve Safety Performance: An Application of


Process Mining to Enhance Data Visualisation
Anastasiia Pika a,1 , Arthur H.M. ter Hofstede a , Robert K. Perrons a,b,∗ , Georg Grossmann c ,
Markus Stumptner c , Jim Cooley
a
Queensland University of Technology, GPO Box 2434, Brisbane QLD 4001, Australia
b
Centre for Strategy and Performance, University of Cambridge, United Kingdom
c
University of South Australia, Adelaide, Australia

a r t i c l e i n f o a b s t r a c t

Article history: The management of health and safety plays an important role in safety performance, and is therefore an
Received 11 October 2020 important foundational element in an organisation’s overall sustainable development. Many organisations
Received in revised form 19 January 2021 are now able to collect vast amounts of data to shed light on the underlying causes behind accidents and
Accepted 24 January 2021
safety-related incidents, and to spot patterns that can lead to solutions. Despite these well-intentioned
Available online 3 February 2021
Big Data collection efforts, however, accident statistics in asset-intensive industries remain stubbornly
Keywords: high as the data frequently fails to reveal actionable insights. In this paper, we answer Wang and Wu’s
Process mining (2020) [60] and Wang et al.’s (2019) [61] calls for the application of Big Data science to the safety
Industrial safety domain by exploring the potential of applying tools and techniques from process mining, a research
Big Data area concerned with analysing process execution data, to derive novel insights from and improve the
Permit to Work visualisation of safety process data. We demonstrate how these tools can yield useful insights in the
occupational health and safety domain by analysing process execution data from a Permit to Work system
in an Australian energy company. Specifically, the analysis presented here highlights the underlying
complexity of the organisation’s Permit to Work process, reveals conformance and performance issues,
and uncovers resources associated with conformance issues and changes in the frequency of such issues
over time, thereby underlining the need to simplify the system. Encouraged by these fresh perspectives
and insights delivered by process mining, we hope that this novel application will be a catalyst for further
research at the interface between these research disciplines.
© 2021 Elsevier Inc. All rights reserved.

1. Introduction “Permit to Work” (PTW) systems [17] have emerged as one ap-
proach to managing these risks. These systems prescribe steps that
As argued by Wang et al. [61], the management of health and must be performed by employees in order to minimise health and
safety plays an important role in safety performance, and is there- safety risks. Such safety processes can be complex, and it is cru-
fore an important foundational element in an organisation’s overall cial to understand how they are executed in practice in order to
sustainable development. Occupational health and safety incidents identify opportunities for process safety improvements. The devel-
frequently result in significant costs both in terms of the human opment of Permit to Work systems have also given rise to a greater
dimensions of these events (e.g., [4]) and the financial conse- degree of systematisation within safety systems, which in turn has
quences that they have on the organisations in which they occur made it possible to measure and capture data about many aspects
[48]. Companies in asset-intensive domains like the energy or min- of safety-related phenomena that had been more opaque before
ing industries have accordingly invested considerable resources on [7]. When coupled with the quickly advancing data collection capa-
a range of solutions to improve the management of health and bilities within asset-intensive sectors [37], Permit to Work systems
safety risks within these sectors [43,50]. have resulted in vast amounts of data being collected in an at-
tempt to shed light on the underlying causes behind incidents and
spot patterns that can lead to solutions [57,53].
*
Corresponding author at: Queensland University of Technology, GPO Box 2434,
Despite these well-intentioned data collection efforts, however,
Brisbane QLD 4001, Australia.
E-mail address: perrons@alum.mit.edu (R.K. Perrons).
accident statistics in several asset-intensive industries remain stub-
1
Anastasiia Pika was working at QUT at the time this work was done, she has bornly high [11,32,53]. A recurring explanation behind this lack of
since left QUT. progress is that, although a lot of data about safety processes is

https://doi.org/10.1016/j.bdr.2021.100210
2214-5796/© 2021 Elsevier Inc. All rights reserved.
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

increasingly recorded in the digital repositories of companies that and offers some suggestions with regards to directions for future
have occupational health and safety management systems (OHS investigations in this area.
MS) [53], obtaining actionable insights from the data is challeng-
ing [43,50] because of the sheer size of these data sets [38]. The 2. Literature review and research question
literature has recently made progress towards leveraging Big Data
and modern analytical techniques in the safety science domain In this section, we introduce occupational health and safety
[45,35], have overlooked the possibility of applying process mining management systems (Section 2.1), provide an overview of existing
techniques as a way to approach the safety community’s Big Data approaches to the measurement of occupational health and safety
challenges. Process mining is a relatively new research area at the performance and describe the associated challenges (Section 2.2),
intersection of business process management (BPM) and data an- discuss current applications of data analytics in the occupational
alytics, and it could be a very potent tool with which to help the health and safety domain, and highlight the novelty of the use of
safety community use its data resources to improve safety perfor- process mining in the domain (Section 2.3).
mance within a broad range of industries.
Process mining provides methods for extracting insights from 2.1. Occupational health and safety management systems
process execution data [56]. Such methods can discover how pro-
cesses are executed in practice rather than how organisations think Occupational health and safety management systems (OHS
they happen (process discovery), and can uncover discrepancies MSs) are used by organisations since the eighties to minimise the
between the prescribed and actual process behaviour (process con- risk of health and safety (H&S) incidents [43]. An OHS MS can
formance), identify performance issues such as bottlenecks (pro- be defined as “a set of institutionalised, interrelated, and interact-
cess performance analysis), analyse behaviours of employees han- ing strategic H&S management practices designed to establish and
dling processes (organisational mining), compare processes, and achieve occupational safety and health goals and objectives” [64].
analyse their evolution over time. An input to process mining Various studies highlighted the importance of such systems and
methods is an event log which contains information about process their positive impact on a safe work environment [33] leading to
instance identifiers, activities and their timestamps and, in some enforcement of their use in legislations such as the directives im-
cases, may also contain other information (e.g., about resources in- posed by the European Union on its member states [5]. However,
without enforced policies, the investing in an OHS MS can be a
volved in the process). Information systems which support safety
daunting proposition for small and medium enterprises (SMEs) be-
processes often record the processes’ execution history in event
cause of the costs involved, e.g., for certification, and the lack of
logs. Such logs can be analysed using process mining. In this paper,
economic benefits visible in the short term [47].
we answer Wang and Wu’s [60] and Wang et al.’s [61] calls for the
application of Big Data science to the safety domain by exploring
2.2. Challenges in the evaluation of occupational health and safety
the potential of applying tools and techniques from process min-
performance
ing, a research area concerned with analysing process execution
data, to derive novel insights from safety process data. We will do
To measure the effectiveness of OHS MSs and to justify invest-
this by extending earlier investigations that make the case for PTW
ments in them, organisations need to evaluate different aspects
systems [42,65] and taxonomies of different PTW systems [23] by
of such systems. This is often done with the help of indicators
demonstrating various process mining methods in the context of a
[43,50] which can be measured and compared during different
PTW system in an Australian energy company. The application of
stages of the system implementation and use. Such indicators are
process mining will also significantly improve the data visualisa-
also expected to uncover potential problems and “weaknesses in
tion problem specifically pointed to by Wang et al. [61].
the organisation’s procedures or employee behaviour” [50]. Organ-
The application of process mining revealed a surprising amount
isations may track hundreds of indicators and record large volumes
of complexity in the organisation’s occupational health and safety of the related data; however, they often find it challenging to ex-
process, identified where in the process deviations from the ex- tract value from the data [43,50].
pected process behaviour often occur as well as resources associ- A literature survey on the effectiveness of OHS MSs [46] found
ated with these deviations, uncovered process performance issues, that “many workplaces are not willing to make a commitment to
and showed that the number of process conformance issues is in- a large intervention like an OHS MS, let alone its evaluation” re-
creasing over time. These findings highlighted the need to simplify lated to the fact that measurement presents conceptual, logistic
the system. This could lead to a decrease in the number of process and resource challenges because of the complexity of such systems
conformance issues and contribute to an improvement of process and their environment. Measurement and monitoring of OHS MSs
safety in the company. presents an even bigger challenge to SMEs who have less resources
The paper is structured as follows. We begin in Section 2 by and are less able to effectively assess risks [5].
exploring the relevant literature from the different research areas Poor safety culture in organisations was shown to be linked
that come together in this topic–most notably including occupa- to H&S incidents [34]. Various aspects of safety culture are of-
tional health and safety and data analytics. We will then explain ten evaluated using questionnaires [17,59,43,10,28]; however, this
the methodology in Section 3, including a concept-level descrip- approach is costly and may be inefficient because answers of em-
tion of the different phases of analysis that go into process mining. ployees may be affected by social expectations and thus may not
It is worth pointing out that our Methodology section is longer accurately reflect reality [43].
than usual in this paper precisely because the overarching objec- In summary, although different studies have shown a positive
tive of this paper is to showcase the application of methodologi- impact of OHS MSs on the workplace safety [33], there are many
cal approaches and analytical tools that have not been previously challenges associated with the use of these systems, such as dif-
used in the health and safety domain. Then, in Section 4, we will ficulties in the evaluation of the systems [46,43,5] and in justi-
demonstrate the application of process mining in the occupational fying investments in them [46,47], and challenges in extracting
health and safety case study being used as a real-world exam- actionable insights from the collected safety data [43,50]. The ap-
ple in this paper to show the kinds of fresh insights and novel plication of process mining to safety data, proposed in this article,
perspectives that process mining can add. Finally, Section 5 puts could help to solve these challenges. Process mining techniques
forward some concluding remarks, highlights the lessons learned, can automatically discover processes from data, highlight various

2
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

Fig. 1. Example of an event log.

conformance and performance issues and produce intuitive visual- 3. Methodology and analytical approach
isations that do not require expert knowledge for understanding;
thus, these techniques could make the evaluation of OHS MSs and In this section, we describe the methodology and overall anal-
safety culture less challenging. Moreover, insights discovered by ysis approach followed in this study. We illustrate the approach
process mining tools could provide clues for improving the work- using examples from a case study in which we analysed a Permit
place safety in organisations. to Work system implemented in an Australian energy company.

2.3. Data analytics in the occupational health and safety domain 3.1. Case study

Although extracting insights from safety data presents a chal- The participating company has over 5,000 employees spread
lenge to many organisations [2,9,13,43,50,53] argue that Big Data across multiple industry sectors, including electrical power gen-
tools may help to make “considerable progress” in organisations eration. The company had a total market capitalisation of under
in the oil & gas industry, and Podgorski [43] argues that organi- USD$20 billion throughout this investigation, making it a “mid-
sations should make use of indicators to better understand their cap” player in the energy space—that is, a middle-sized firm by
health and safety processes. However, it was not shown how this market capitalisation with a substantial regional footprint, but
can be achieved. without a significant international presence. The company’s power
A few recent studies reported the use of Big Data tools in the generation division was specifically chosen for this investigation
safety domain [53,45,36,44,49]. Tan et al. [53] described exam- for two reasons: (1) the department was and is very committed to
ples of application of Big Data tools to mitigate risks across sev- reducing its injury and near-miss statistics and, consequently, (2)
eral industries: in the oil & gas industry workers used hands-free the division made available its employees, managers, and Permit to
checklists to reduce the likelihood of mistakes when assembling Work data repositories in the hope that the findings from this in-
equipment; built-in sensors were used in manufacturing to moni- vestigation could be used to help them achieve their OHS-related
tor assembly operations, and Big Data analytics was used for risk goals.
detection in supply chains. Rashidy et al. [45] presented a Big Data The company communicates a strong commitment to health
modelling approach that can help to understand the factors associ- and safety throughout the organisation and, as discussed ear-
ated with SPAD (Signals Passed At Danger) risk. Shi et al. [49] used lier, has in place 11 Life Saving Rules of which employees are
machine learning to predict risk levels of vehicles in driving and frequently reminded. The Life Saving Rules were implemented
[36] applied machine learning methods for road accident detection in 2011, and violating them can carry serious consequences—
from traffic condition data. Polyvyanyy et al. [44] presented an ap- including termination of employment.
proach for discovery of causalities between events recorded in the The Permit to Work system is the company’s safety process
health and safety domain. Ouyang et al. [35] argue that research which defines steps that must be followed by employees in or-
on the application of Big Data in the safety domain is still limited der to minimise health and safety risks. The participating company
and “it’s important to realise that safety data is becoming expo- had maintained an extensive database that captured the opera-
nentially un-analysable with traditional statistics methods, most of tional details of how the system had been operating over a period
whose contributions fail to move beyond existing theories and rou- of several years, including when permits were issued, and the na-
tine application to tackle the dynamic and complex issues in terms ture of the processes that were stopped and started en route to
of volume, intensity and complexity, forcing a rethink on how to delivering the intended objectives.
exploit the vast values of safety data efficiently”.
We believe that process mining, a research area at the intersec- 3.2. Process mining
tion of data analytics and BPM, might be able to add significant
value to this area. Process mining techniques were successfully ap- To analyse the data, we used techniques from the field of pro-
plied across different industries and helped to uncover valuable cess mining, which is concerned with extracting insights from
insights about processes and improve these processes in multiple process execution history recorded in event logs [56]. Such event
organisations. However, to the best of our knowledge, the possi- logs are often recorded by information systems supporting busi-
bility of applying process mining to safety data has been largely ness processes (or can be created by collecting and consolidating
overlooked in the literature thus far. This leads us to: information from different sources) [56]. The occupational health
and safety domain records huge volumes of data [43,50] including
Research Question: Can techniques from the process mining do- information about safety processes.
main shed new light on large data sets in the occupational Process execution data often includes information about process
health and safety area? instance identifiers (i.e., case IDs), process steps (i.e., activities),
and their timestamps and can include information about resources
In our earlier work [40], we described some examples from a involved in the process and process-related data. Fig. 1 depicts an
case study in which we analysed process execution data of a safety example of an event log which contains process execution data of
process. We use examples from this case study—including one ex- a safety process. Each row corresponds to an event in the process
ample presented earlier [40]—to illustrate the discussions in this and each column represents an attribute of the event. For exam-
article. ple, we can see that in case 12345 a permit to work was created

3
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

by employee R1 on 3/01/2018, risk assessment was performed by 3.3.1. Process discovery


employee R2 the next day and it was started at 11:12 and com- In order to minimise health and safety risks, many organisa-
pleted at 15:08. We can also see that case 12345 is not urgent tions introduce safety processes such as Permit to Work systems
(urgency = low), while case 12346 is urgent. [17]. Such processes aim to reduce risks associated with the execu-
Process mining provides a range of techniques which allow to tion of hazardous activities by prescribing steps which employees
analyse such event logs and obtain insights about different as- must complete before or after the execution of the hazardous ac-
pects of process executions. Process mining algorithms require the tivities [27], e.g., they may need to assess risks, obtain approvals
following attributes to be recorded in the log: process instance for certain types of work or isolate the work site.
identifiers, activities and their timestamps. Some process mining Many processes—including OHS MSs—are becoming increasingly
methods may require additional information to be recorded in the complex and, to manage this trend towards increasing complexity,
log; for example, activity types (e.g., start and complete), resources to communicate processes more effectively, and to improve op-
involved in the process (e.g., employees or teams) or process data erational efficiency, many companies use process modelling [56].
(e.g., urgency). Some methods may also require additional inputs Process models describe the ordering of process activities (the
such as process or organisational models. control-flow perspective), they may also specify how resources in-
We used process mining to analyse process execution data from teract with the process (the organisational perspective), how data
the Permit to Work system implemented in the case study com- is used in the process (the data perspective) or define temporal
pany. Information about the Permit to Work system was recorded properties of the process (the time perspective). Processes can be
by different information systems. The data was provided by the described using different notations, for example Petri nets, BPMN,
collaborating organisation, and we worked with company repre- YAWL, or EPCs [56].
sentatives who have knowledge of the process and the information Concurrent processes were first modelled using Petri nets and
systems to collect and interpret the data. We created an event log this notation is used by many process mining approaches [56].
which included information required by process mining techniques Petri nets have formal semantics and can be analysed using a num-
(i.e., case IDs, activities, activity completion times and resources). ber of techniques [55]. A Petri net is a directed bipartite graph
Consisting of 494,568 independent events related to 632 differ- which has two types of nodes called places (depicted by circles)
ent employees between 28 March 2008 and 8 December 2016, the and transitions (depicted by rectangles) [55]. When Petri nets are
event log contained information about 43,061 cases that were re- used to model business processes, transitions represent activities,
lated to 23 distinct activities. places represent conditions and directed arcs are used to model
We acknowledge that a data set consisting of slightly less causal dependencies between them [55]. Fig. 2 depicts an exam-
than half a million independent events falls below some people’s ple of a Petri net which specifies the expected flow of 12 major
threshold for “Big Data,” but we respectfully submit that this char- process steps in the Permit to Work system we analysed.
acterisation is at once helpful and constructive in this particular Places in a Petri net can hold tokens represented by black dots
instance. There are two reasons for this. First, people from differ- (e.g., in Fig. 2 place start holds one token). A transition can fire, i.e.,
ent technical backgrounds face different operational realities, and an activity it represents can be executed, if all of its input places
definitions of Big Data have accordingly popped up in a discipline- hold at least one token. When a transition fires it removes one
token from all of its input places and puts one token in all of its
specific way, covering a disparate range of professions and disci-
output places.
plines as diverse as obesity research [58], nursing [62], and cloud
If a place is an input to more than one transition (e.g., in Fig. 2
computing [24]. These communities have mostly failed to agree on
the input place to transition PTW Surrendered), it represents a point
exactly what Big Data is on account of the fact that the term is still
in the process where a choice is made (e.g., either transition PTW
loaded with conceptual vagueness [16]. It therefore follows that,
Surrendered fires, i.e., activity PTW Surrendered is executed, or the
although a data set outlining half a million independent events
silent transition connected to the same input place fires, which
might not feel like Big Data to researchers fascinated by the tech-
represents skipping of activity PTW Surrendered).
nical boundaries of Big Data tools and techniques, it is significantly
If a transition is an input to more than one place, it represents
larger than the occupational health and safety discipline is used to
a point in the process where parallel execution starts. For exam-
handling [53]. Second, beyond merely showcasing a simple appli-
ple, in Fig. 2 the sequence of activities RFA Created, RFA Approved,
cation of existing Big Data approaches, this paper will demonstrate
PTW Created is executed in parallel to the sequence of activities
process mining methods in an attempt to show how they can add
PTW RA Created, PTW RA Authorised. The silent transition that fol-
value to other types of Big Data analyses. We argue that this point
lows these two sequences represents a synchronisation point—that
can be compellingly made on a data set that does not test the
is, the process cannot continue until both sequences are completed
technical limits of Big Data capabilities.
(the silent transition can only fire when both of its input places
have a token).
3.3. Explanation of approach BPMN is another process modelling notation which has recently
become “one of the most widely used languages to model busi-
To analyse the event log, we used the process analysis tool ness processes” [56] and is easy to understand for business users
Disco2 and the open source process mining framework ProM3 [19]. Main elements of the BPMN language include activities (de-
which supports most existing process mining methods. The process picted by rectangles), control nodes (depicted by diamond shapes)
mining analysis was divided into five phases: (1) process discovery, and connecting arcs [14]. Control nodes, called gateways, can be of
(2) process conformance, (3) process performance, (4) organisa- two types: splits, which represent process points where the flow
tional mining, and (5) process change. The motivations for each diverges, and joins, which model process points where the flow
of these phases—including how, specifically, each part of the anal- converges [14]. The abovementioned BPMN elements are related to
ysis was carried out, and what trends and patterns can be seen as the control-flow perspective, BPMN also provides constructs which
a result—will be explained in turn in subsections 3.3.1 to 3.3.5. allow to model other process perspectives, such as the organisa-
tional or the data perspectives [14]. Fig. 3 depicts a process model
of the Permit to Work system modelled using BPMN.
2
https://fluxicon.com/disco/. Manually created process models represent desired or expected
3
http://www.promtools.org. process behaviour. As is the case with all models, hand-made pro-

4
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

Fig. 2. Process model of 12 major steps of the Permit to Work system specified as a Petri net (RFA – request for access, PTW – permit to work, PTT – permit to test, RA –
risk assessment).

Fig. 3. Process model of 12 major steps of the Permit to Work system specified using BPMN.

cess models often simplify reality. Such models may not include all in the occupational health and safety domain may highlight sig-
possible process execution scenarios, are often subjective [56], and nificant safety risks (e.g., not informing all involved parties about
may represent views of a limited number of stakeholders involved maintenance work can lead to accidents [27]) and provide insights
in process modelling activities. for process improvement (e.g., highlight the need for simplifica-
Moreover, employees involved in processes do not always fol- tion).
low the processes as expected. Hence, real processes may be sig-
nificantly different from those described by hand-made process 3.3.2. Process conformance
models. Understanding how processes are executed in real life is Process conformance analysis aims to discover where in the
the first crucial step towards improving their performance. process and how often deviations from the expected process be-
Process discovery is an area of process mining concerned with haviour occur. Process discovery can provide some insights into
discovering process models from event logs. A number of pro- process conformance, e.g., by inspecting a discovered process
cess discovery algorithms have been proposed. They take as input model one may notice that an important part of the process is not
an event log which contains information about process instance always followed. However, if a process is complex (e.g., the one de-
identifiers, activities and their timestamps and produce a process picted in Fig. 4) it can be difficult to analyse process conformance
model using some process modelling notation, e.g., Petri nets [56]. by looking at a discovered process model. Process conformance
Event log data is not always accurate and complete, and process issues in the health and safety domain may be associated with sig-
discovery algorithms need to be able to work with such data. nificant health and safety risks, e.g., not following Permit to Work
While earlier process discovery algorithms, such as the alpha- systems may lead to maintenance-related accidents [27]. Hence,
miner [55], could not handle such logs, more advanced process locating such issues and understanding their nature is of critical
discovery algorithms, e.g., the inductive miner [29], can discover importance.
models from such data. Some process discovery tools, e.g., the in- Process conformance analysis takes as input a normative pro-
ductive visual miner [30] or the Disco miner [21], allow to filter cess model which specifies the expected process behaviour and
process behaviour (e.g., only select mainstream process behaviour) an event log which captures information about actual process
and discover process models with respect to this behaviour. Such executions. Process conformance analysis techniques aim to find
a capability can help to explore complex processes. discrepancies between the log and the model. An example of a
Process discovery algorithms help to uncover how processes are conformance analysis approach that can discover control-flow non-
executed in practice and they may provide a number of insights conformances is an alignment-based process conformance analysis
about processes. For example, a manager may discover that a real- technique [54]. The technique creates the so-called alignments by
life process is more complex than expected, that employees often relating events in the log to elements of the normative process
skip a critical process step (e.g., an approval) or that some activi- model (specified as a Petri net) and highlights discrepancies be-
ties are performed too late (e.g., risk assessment). Such discoveries tween the prescribed and the observed process behaviour. The

5
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

Fig. 4. High-Level Overview of the Permit to Work system. Note that our objective with this figure is to show how “spaghetti-like” and unintelligible the system was in the
initial analysis. We have accordingly elected to present it here as a high-level overview in which practically no details can be seen. If readers are frustrated by the inability
to see meaningful details in this particular figure, then that was our point.

output of the technique is the normative process model whose el- necessary changes to improve process performance (e.g., they can
ements are annotated with alignment frequency information (e.g., enforce new policies or balance workload of their employees).
an example of such a model is depicted in Fig. 6).
Process conformance analysis can also be performed consid- 3.3.4. Organisational mining
ering other process perspectives, such as the resource and the Human factors are cited as one of the leading causes of safety
data perspectives [12]. Such analysis requires that relevant pro- incidents across various industries [6,17,25]. Understanding work
cess information (e.g., process resources and data) is recorded in behaviours of employees can help to identify and eventually elim-
the event log and described by the normative process model. The inate safety risks related to human actions. Organisational mining
analysis enables the discovery of process conformance issues that is an area of process mining which allows to discover insights
are not related to the control-flow perspective. For example, when about different aspects of employee behaviour from event logs.
considering the resource perspective, one may learn that important Such event logs must contain information about resources (i.e.,
activities (e.g., risk assessments or approvals) were performed by employees) who executed process activities. Organisational min-
unauthorised employees. Another example of a conformance issue ing techniques can help to discover organisational models and
that could be identified is a violation of the separation of duties social networks, and by uncovering actual social structures in an
(“four-eyes”) principle [1], i.e., when two activities that must be organisation, they can provide insights for process improvement
performed by different employees were performed by the same [51]. Organisational mining can also help to analyse behaviours
employee. of teams or individual employees, e.g., this can be done with the
help of an extensible framework for resource behaviour analysis
3.3.3. Process performance [39]. The framework consists of three modules which allow to: 1)
Safety processes such as Permit to Work systems have to be extract descriptive information about different aspects of resource
efficient in order to achieve their goal of minimising safety risks. behaviour related to employee skills, utilisation, preferences, pro-
Poorly performing safety processes are less likely to be followed ductivity, and collaboration; 2) evaluate resource behaviour; and
by employees; for example, if obtaining an approval is taking too 3) evaluate resource productivity. The framework also provides an
long, employees may be more tempted to skip the step. Such be- interface which allows users to define their own measures of re-
haviour increases safety risks and can eventually lead to incidents. source or process behaviour. Organisational mining techniques can
For example, it was shown that poor Permit to Work systems con- shed light on how employees handle safety processes and can help
tributed to maintenance-related accidents [27]. to uncover hazardous patterns of employee behaviour. For exam-
In order to improve the performance of a process, an organisa- ple, one could discover that some employees often skip a crucial
tion needs to be able to first identify process performance issues. process step or do not perform their work thoroughly (e.g., per-
This can be achieved with the help of process performance analysis form risk assessment too quickly).
techniques that can evaluate various aspects of process perfor-
mance by analysing process execution data. These techniques usu- 3.3.5. Process change
ally take as input an event log and a process model, align the log In modern business environments which are increasingly com-
and the model, and calculate and visualise different process per- petitive and dynamic processes often change over time [8,31]. Such
formance metrics [56]. The types of analysis that can be performed changes may be instigated by an organisation’s process improve-
depend on the types of events recorded in the log. For example, if ment initiative, by introduction of new legislation a company must
the log only contains complete events, then one can only analyse comply with [8] or they may be unplanned and related to changes
times between activity completions; however, if the log also con- in the ways employees handle processes [31], e.g., over time, em-
tains start events, then it is possible to consider activity durations ployees may start following a specific process path or skipping
and waiting times. A typical output of a process performance anal- a process step more frequently. Such changes in processes can
ysis tool is a process model annotated with performance-related be identified by analysing process execution data. Process mining
information, e.g., it can show average activity durations or me- techniques that can uncover process changes include methods for
dian waiting times. Some tools, such as ProcessProfiler3D [63], can the detection of process drifts [8,31] which can identify changes
also compare and visualise the performance of different process in the control-flow perspective, an approach for evaluation of the
cohorts, e.g., the performance of cases related to planned mainte- overall process risk [41] which can consider multiple process per-
nance and those related to major forced outages [40]. spectives (e.g., the organisational or the data perspective), and
Process performance analysis can help to identify process per- an extensible framework for analysing resource behaviour which
formance issues such as bottlenecks (e.g., activities with long ex- allows to track the evolution of given employee or process be-
ecution or waiting times) or process variants with long execution haviours [39].
times. For example, using process performance analysis tools one Timely identification of hazardous changes in safety processes
could uncover a process performance issue described by Iliffe et (e.g., identification of an increase in the number of process con-
al. [27] when permits were only issued during a specific time of formance issues) could help in incident prevention. Process mining
the day, thus causing delays in the commencement of maintenance can detect such changes on a process level [8,41,31] or it can help
works. Once such issues are identified, organisations can make uncover changes in specific process behaviours [41,39].

6
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

4. Results and discussion executed as expected in these cases), but in 771 cases the align-
ment was not created (i.e., in these cases the activity was either
In the previous section, the main analytical parts of process not performed or it was performed in a wrong place in the pro-
mining—specifically, process discovery, process conformance, pro- cess, e.g., it was executed earlier than prescribed by the process
cess performance, organisational mining, and process change— model).
were explained at a conceptual level. In this section, we will exam- We used a BPMN version of the normative process model de-
ine the case study data through each of these five process mining picted in Fig. 3. BPMN is widely used to model business processes
lenses and, in so doing, show the type of novel insights and fresh [56], and is easy to understand for business users
perspectives that can be made available with each of them. [19] and highlighted activities in this model with different
colours based on the frequency of alignments: activities high-
4.1. Process discovery lighted with green were aligned with the log in all cases, for
activities highlighted with yellow, alignments were not created in
To learn how the Permit to Work system is actually executed, some cases, and for those highlighted with red, alignments were
we used a process discovery tool called “Disco Miner” [21]. The not created in many cases. The resulting model, depicted in Fig. 7,
tool mines a fuzzy model which is “intuitively understandable” by was demonstrated to company representatives. As we can see from
users without prior process mining experience [21], The input to Figs. 6 and 7, most non-conformances occurred during the first
the discovery tool is an event log with process instance identifiers, part of the process and were often related to approvals and risk
activities, and timestamps. We applied the tool to the event log assessment. The second part of the process was executed as ex-
described in Section 3. pected in most process instances, with a few discrepancies related
The discovered model, shown in Fig. 4, demonstrated the com- to activities PTT Issued and PTW Withdrawn.
plexity of the Permit to Work system. Further exploration of the The conformance checking plug-in allowed us to quickly locate
process in Disco revealed that the process has 2,306 variants. The parts of the process where non-conformances frequently occurred.
model was demonstrated to company representatives and they To better understand the nature of these non-conformances, we
were impressed by the discovery result. A company representa- checked those parts of the process where they occurred using
tive noted that the complexity of the process was cited by some Disco. As an example, consider activities RFA Created, RFA Ap-
process participants as a cause of productivity losses and inad- proved, and PTW Created. According to the normative process mod-
vertent process compliance issues. The discovery result highlighted els shown in Figs. 2 and 3, these activities are expected to be
the need to simplify the process and inspired the company to con- performed sequentially. The conformance checking result (Figs. 6
sider a process simplification initiative. and 7) shows us that, in many cases, alignments were not created
We used Disco’s filtering capabilities to discover mainstream for these activities. We filtered the event log so that only events
process behaviour. Specifically, we selected 51% of the cases that related to these three activities were kept and explored the result-
followed frequent process paths, and discovered a process model ing log in Disco (Fig. 8). Fig. 8a depicts the frequencies of different
only considering these cases. The discovered fuzzy model is de- paths these three activities took in the process (e.g., activity RFA
picted in Fig. 5. Rectangles represent activities in the process and Created was followed by activity RFA Approved in 37,015 cases, but
directed arcs show causal dependencies between them. The model in 4,011 cases activity RFA Approved was followed by activity RFA
also shows how often different paths were followed in the process. Created). Fig. 8b depicts median durations between these activities
For example, we can see that activity PTW_Created was directly fol- (e.g., the median duration between activities RFA Created and RFA
lowed by activity PTW Approved in 22,223 cases. Fig. 5 shows that Approved is 16.5 hours, while it is only 22 seconds between activi-
the process followed by the majority of cases is much simpler than ties RFA Approved and RFA Created).
the process discovered from all process instances (Fig. 4). Further Fig. 8a shows that the three activities were executed as ex-
exploration in Disco showed that the selected 51% of cases only pected in most process instances; however, for example, in 4,011
follow six process variants, while the remaining 49% of cases fol- cases RFA Approved was performed before RFA Created and in 1,891
low 2,300 process variants (indicating that many process paths are cases PTW Created was performed before RFA Approved. Looking
only followed by a few cases). To investigate exceptional process at the median durations between the activities helped us to ob-
behaviour, we used process conformance analysis. tain further insights into the nature of these deviations. Fig. 8b
shows that the median duration between activity RFA Approved
4.2. Process conformance and activity RFA Created is only 22 seconds. Since in most cases
the two activities were executed within a few seconds, this indi-
To analyse process conformance of the Permit to Work system, cates that this deviation from the expected process behaviour is
we used an alignment-based process conformance technique (van more likely related to a configuration of an information system
der Aalst, Adriansyah et al. 2012) described in Section 3.3.2 and supporting the process rather than to employee behaviour. How-
implemented as ProM plug-in. A normative process model of the ever, in those process instances where activity PTW Created was
Permit to Work system was created based on the process docu- performed before activity RFA Approved the median duration be-
mentation. The model specifies the expected behaviour of 12 major tween the two activities is 5.7 days, indicating that this behaviour
process steps (Fig. 2). We used the event log described in Sec- could be a process conformance issue that should be investigated
tion 3 and filtered out those activities which are not specified by further.
the normative process model. The resulting event log was replayed
on the process model using the conformance checking plug-in. The 4.3. Process performance
plug-in output is depicted in Fig. 6. Activities in the process model
that were not always aligned with the corresponding events in the To analyse the performance of the Permit to Work system, we
event log are highlighted with red. Every activity is also annotated used an alignment-based performance analysis approach [54] im-
with the alignment frequency information, i.e., the number of pro- plemented as a ProM plugin. The plug-in takes as input a Petri net
cess instances in which the activity was aligned with the event log and an event log, replays the log on the model, and calculates and
and the number of process instances in which the alignment was visualises selected process performance metrics. It can produce dif-
not created. For example, as depicted in Fig. 6, activity PTW Cre- ferent types of process performance visualisations, such as a Petri
ated was aligned with the event log in 42,290 cases (i.e., it was net annotated with performance information (e.g., showing aver-

7
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

Fig. 5. Six most frequently followed paths in the Permit to Work system (followed by 51% of cases).

age execution times) or a table which shows values of a selected other long average time is between activities PTW Issued and PTW
process performance metric between different process steps (e.g., Surrendered (17.13 days). Activities PTT Created and PTW Surrendered
the table depicted in Fig. 9). are not always performed in the process (the normative process
We used the model and the event log described in the previ- model allows skipping both activities). We checked the median du-
ous section. The event log only contained complete events and did ration of cases in which these activities were performed and those
not contain start events. As a result, we could only consider the in which they were not performed using Disco. We discovered that
time between activity completions. Fig. 9 shows the average time there were significant differences in the performance of these pro-
between pairs of activities visualised by the plug-in. cess variants: the median duration of cases in which both activities
We annotated the BPMN model of the process (depicted in were performed was 39.2 days, while the median duration of cases
Fig. 3) with the process performance information depicted in Fig. 9, in which they were not performed was only 14.1 days. The plug-in
resulting in the model shown in Fig. 10. The annotated BPMN visualised the process performance and highlighted performance
model was demonstrated to the company representatives. issues; however, we did not investigate the underlying causes of
As shown in Fig. 10, the longest average duration in the process these issues and the ways to overcome them, it is a direction for
is between activities PTW Issued and PTT Created (23.01 days). An- future work.

8
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

Fig. 6. Conformance checking result for 12 major steps of the Permit to Work system shown in Fig. 2. Figs. 6, 7, 9, 10, 11, 12, and 13 were originally created with the ProM
software tool described elsewhere in this paper, but the resulting figures created by ProM were not very easy to see. These figures were therefore recreated using a graphics
program to make them more legible—but we do want to emphasise that ProM does not actually make them look like this in the first instance. (For interpretation of the
colours in the figure(s), the reader is referred to the web version of this article.)

Fig. 7. BPMN model of 12 major steps of the Permit to Work system; activities are coloured based on the frequency of alignments between the model and the event log as
shown in Fig. 6 (green – alignments were created in all cases, yellow – alignments were not created in some cases, red – alignments were not created in many cases).

Fig. 8. Frequencies of different paths in the Permit to Work system followed by activities RFA Created, RFA Approved, and PTW Created (a); and median durations between
the three activities (b).

4.4. Organisational mining plug-in.4 We used the first module of the framework which ex-
tracts values of resource behaviour indicators from an event log
and visualises their evolution over time.
To analyse behaviours of employees involved in the Permit to
Work process, we used the framework for resource behaviour anal-
4
ysis (described in Section 3.3.4) which is implemented as ProM https://github.com/a-pika/MiningResourceProfilesPlugin.

9
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

Fig. 9. Average time between completions of major steps of the Permit to Work system (in days).

Fig. 10. BPMN model of the major steps of the Permit to Work system annotated with information about the average time between activities (in days).

Here, we present an example of resource behaviour analysis plug-in.5 An input to the approach is an event log that captures
related to process conformance. We defined an indicator which information about actual process executions. Another input is a
counts the number of process instances started during a given process model that specifies the expected process behaviour. De-
time slot in which one given activity was performed by a given viations from such process model are considered by the approach
employee before another given activity. The indicator was used to as risky process behaviours. Examples of risky process behaviours
check how often different employees were associated with one include skipping of an activity (e.g., an approval) or execution of
type of process conformance issue: the execution of an activity an activity by an unauthorised employee. The approach identifies
sooner than prescribed by the process model. We used the event such risky process behaviours by replaying the event log on the
log described in Section 3. process model. Users can assign different costs to risky process be-
We learned that the number of non-conformances associated haviours (higher costs are assigned to those behaviours which are
with different employees varies and that for some employees it is considered more risky). The approach evaluates the overall pro-
changing over time. As an example, Fig. 11 depicts quarterly val- cess risk at a given point in time or during a given time slot as
ues of the indicator for three selected employees (anonymised): it a function of costs of all risky process behaviours identified in
shows the number of cases in which activity PTW Created was per- the process instances that were active during that time. The plug-
formed by a given employee before activity RFA Approved (as we in extracts the overall process risk time series and visualises it.
can see from Fig. 2, such process behaviour is a deviation from the The approach can consider different process perspectives, i.e., the
expected process flow). We can observe, for example, that employ- control-flow, time, organisational or data perspective. In the case
ees 572 and 652 often performed activity PTW Created before RFA study, we only considered risky process behaviours related to the
Approved during the periods of 2008–2010 and 2015–2016, while control-flow perspective.
during 2011–2014 most non-conformances were associated with We used the process model and the event log that were used
employee 677. for process conformance analysis and described in Section 4.2. The
costs of all risky process behaviours (i.e., deviations from the ex-
4.5. Process change pected process flow) were set to one for all activities. We used
the mean process risk metric of the plug-in which measures the
We applied process change detection methods to check how the average cost of risky process behaviours detected in all process in-
frequency of process conformance issues in the Permit to Work stances that were active during a given point or period of time.
system is changing over time. To learn whether the overall number
of process conformance issues is changing, we used an approach
5
for evaluation of the overall process risk [41] implemented as ProM https://github.com/a-pika/OverallProcessRiskPlugin.

10
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

Fig. 11. Number of cases in which activity PTW_created was performed by a given
employee before activity RFA_Approved for three selected employees: 572, 652, and
677.

Fig. 13. An example of evolution of a process conformance issue: the number of


cases in which activity PTW_Created was performed before activity RFA_Approved
(quarterly values).

5. Conclusions

In this paper, we answered Wang and Wu’s [60] and Wang


et al.’s [61] calls for the application of Big Data science to the
safety domain by exploring the potential of applying tools and
techniques from process mining, a research area concerned with
analysing process execution data, to derive novel insights from and
Fig. 12. Average number of deviations from the expected process behaviour in pro- improve the visualisation of safety process data. We achieved these
cess instances vs. year. objectives by showcasing the potential of applying process min-
ing techniques to derive novel insights from safety data, thereby
The plug-in extracted quarterly values of the mean process risk for making progress on the visualisation problem specifically identi-
cases started during a quarter. fied by Wang et al. [61]. As discussed earlier, large volumes of
As the costs of all risky process behaviours were set to one, the data are recorded in organisations with OHS MSs about the con-
mean process risk time series extracted by the plug-in (Fig. 12) duct of occupational health and safety processes and, although
shows the average number of deviations in cases started during this data is used extensively to produce key performance indica-
different periods of time. We can observe that the average num- tors that measure business performance, it has failed to reveal new
ber of deviations is slightly increasing over time. For example, in insights about how, specifically, the systems can be improved to
April 2008 the mean process risk value was around 0.35 (i.e., on make these workplaces safer. As a step towards improving this,
average a deviation happened in approximately one third of pro- we demonstrated how process mining can be applied in a novel
cess instances), while in August 2016 it was around 0.5 (i.e., on way in the occupational health and safety domain. The applica-
average a deviation happened in every second process instance). tion of various process mining methods to data from a Permit to
We also checked whether the frequencies of specific process Work system in an Australian company helped to uncover differ-
conformance issues are changing over time using the framework ent types of insights about the process. Specifically, it highlighted
for resource behaviour analysis implemented as ProM plug-in and process complexity and identified process performance and con-
described in Section 3.3.4. We used the plug-in’s interface that al- formance issues. The findings inspired the company to consider a
lows users to define their own indicators of resource or process be- process simplification initiative which could improve process safety
haviour and we defined measures for different types of deviations in the company, and brought about important outcomes for both
from the expected process behaviour. For example, we defined an industry practitioners and the theoretical foundations of the safety
indicator which counts the number of process instances started science research area.
during a given time slot in which one given activity was performed
after another given activity. The plug-in then extracted quarterly 5.1. Implications for industry
values of this indicator for given pairs of activities from the event
log and visualised the corresponding time series. An instantiation The application of various process mining techniques to a safety
of this indicator for activities PTW Created and RFA Approved is de- process allowed us to explore the process behaviour from differ-
picted in Fig. 13. ent perspectives and yielded some unexpected findings. Process
The normative process model shown in Fig. 2 specifies that ac- discovery identified a surprisingly high level of complexity of the
tivity PTW Created has to be performed after activity RFA Approved. Permit to Work system. Process performance analysis highlighted
Conformance analysis helped us to identify that this was not al- activities in the process that contributed to long process execu-
ways the case (Section 4.2); however, as we can see in Fig. 13, the tion times. Conformance analysis showed where in the process
number of non-conformances with respect to this behaviour (i.e., deviations from the expected process behaviour frequently occur,
the number of process instances in which activity PTW Created was organisational mining helped us to identify employees associated
performed before RFA Approved) is slightly decreasing over time. with these non-conformances and techniques for process change
Although the overall number of process conformance issues is in- detection discovered that the frequency of process conformance is-
creasing over time (Fig. 12), we learned that the frequency of some sues is increasing over time. The complexity of the process could
process conformance issues is not changing or is decreasing (e.g., be a factor contributing to process performance and conformance
Fig. 13). issues. Identification of these issues is the first crucial step to-

11
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

wards improving process safety. The case study findings inspired largest companies in the energy sector—like, for example, the oil
the company to consider a process simplification initiative. & gas supermajors [18]—there are thousands of companies around
the world that both fulfil the definition of a “mid-cap” and have
5.2. Implications for theory OHS systems (e.g., [20,15]). The findings presented here are not so
case-specific that it would be difficult to apply the essential learn-
Different studies showed that OHS MSs have a positive im- ings from this investigation to those other contexts.
pact on the workplace safety [33]; however, it is often difficult to Second, it is important to remember that the overarching aim
evaluate such systems [46,43,5] and justify investments in them of this paper was to apply process mining in the safety do-
[46,43]. Although it was argued that analysing large volumes of main. This essentially entailed the experimental integration of lit-
data recorded by OHS MSs could help to better understand the erature and methodological techniques from the safety literature
impact of such systems and organisational safety processes [43,53],
with literature and tools from the process domain area to see
extracting actionable insights from safety data remains a challenge
if, when brought together, they could yield valuable new insights
for organisations [43,50,35]. Moreover, Ouyang et al. [35] argue
that had remained less visible beforehand. In doing so, our largest
that “safety data is becoming exponentially un-analysable with tra-
objective with this investigation was to make a methodological
ditional statistics methods” and that we need to rethink “how to
contribution—specifically, to open up new, untested possibilities for
exploit the vast values of safety data efficiently.” A large and grow-
novel explorations at the boundary between these two research
ing chorus of voices has grown within the safety literature (e.g.,
[22,3]) that is drawing attention to the increasing prevalence of disciplines. There may not have been much potential complemen-
large data sets within this research discipline. And as the research tarity between these two communities in the past but, with the
discipline becomes more data-rich and data-intensive, the research emergence of ever larger OHS MS data repositories, we submit that
community in this area is also experimenting with new techniques this kind of collaboration now makes sense not only in organisa-
and analytical approaches (e.g., [26]). This paper is similarly in- tions that closely resemble the one at centre of our case study, but
tended to open up new analytical tools and methodological lenses in lots of other organisations as well. It is our hope that this in-
by considering an analytical approach that has helped to yield new vestigation will act as a catalyst that leads to more research at this
insights in a broad range of other research disciplines. particularly fertile intersection of research domains.
The main contribution of this article is an introduction of the Also, the types of process mining analyses we performed during
application of techniques and tools from the field of process min- the case study were tailored to the characteristics of the process
ing in the occupational health and safety domain. We used as an execution data available to us. Richer event logs would allow us
example the Permit to Work process implemented in an Australian to perform more sophisticated analyses. For example, the availabil-
energy company and demonstrated different types of process min- ity of start events in the process execution data would allow us to
ing analyses that can be performed on the process execution data analyse service and waiting times. Nevertheless, we showed that
and different types of insights they can yield. We showed how the availability of only basic attributes that are typically required
process mining can help to uncover actual process behaviour, can by process mining tools (i.e., process instance identifiers, activi-
pinpoint where the discrepancies between the expected and the ties, timestamps and resource information) enables the discovery
actual behaviour occur and which employees are associated with of helpful insights in the safety domain.
the deviations, can identify bottlenecks in the process and show Process mining tools used during the case study allowed us to
how the process behaviour is evolving. Understanding how OHS
analyse different aspects of the safety process (i.e., to consider the
processes are executed in real life could make the evaluation of
overall process, specific process behaviours, behaviours of employ-
OHS MSs and organisational safety culture less challenging, while
ees handling the process and changes to the process) and helped
locating specific process issues (such as non-conformances or bot-
us to uncover process conformance and performance issues, but
tlenecks) and employees associated with these issues can provide
the underlying causes of these issues were not investigated. A di-
clues for improving the processes and the workplace safety in or-
rection for future work is the application of tools that could help
ganisations.
We see this foray into applying process mining into the occu- to identify the sources of the process issues, e.g., the use of an ap-
pational health and safety domain as a first-cut attempt to bring proach for root cause analysis [52]. And while this paper sought
together two research disciplines that had not been properly con- to improve the resolving power with which PTW systems can be
nected before. It therefore follows that we recognise that future analysed, it falls short of explaining exactly why they do or do
attempts to bridge these two research areas could be done bet- not work, or how they might be improved. We accordingly sug-
ter or, at a minimum, differently. Nonetheless, it is our hope that gest that, with the benefit of Big Data analytical tools like the
this paper will be a catalyst for further research at the interface ones explored in this paper, future investigations would do well
between these two domains. to address these important questions. Another direction for future
work is an investigation of the ways to improve the safety pro-
5.3. Limitations and recommendations for future research cess, e.g., by simplifying the process (considered by the case study
company). This could help to improve the process performance
Our chosen methodology for this investigation does open up a and reduce the number of process conformance issues. One could
potential weakness in this contribution: to what degree can the also investigate whether improving the process representation and
findings presented here be reasonably extended to other organisa- communication could lead to a decrease in the number of confor-
tions and industries? Is the case so unique that it is unlikely that mance issues.
the observations made here would be usefully relevant in other
scenarios? Two aspects of the case study make a compelling case
for believing that the evidence presented in this investigation can Declaration of competing interest
shed a useful amount of light on data-related challenges for OHS
MSs in other sectors and contexts. First, as discussed in Section 3,
the organisation is most appropriately categorised as a “mid-cap” The authors declare that they have no known competing finan-
company that is significantly smaller than more internationally cial interests or personal relationships that could have appeared to
recognisable global brands within the energy industry. Unlike the influence the work reported in this paper.

12
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

Acknowledgements [28] L. Jong-Hyun, et al., The effects of personality types on self-reported safety be-
havior: focused on plant workers in Korea, Accid. Anal. Prev. 121 (2018) 20–27.
[29] S.J.J. Leemans, et al., Discovering block-structured process models from event
In this article, we use some examples from a case study con-
logs – a constructive approach, in: International Conference on Applications
ducted in an Australian company, including one example described and Theory of Petri Nets and Concurrency, 2013.
in our earlier work [40]. The case study was funded by the Asset [30] S.J.J. Leemans, et al., Exploring processes and deviations, in: International Con-
Institute. We would like to thank Professor Joseph Mathew from ference on Business Process Management, 2014.
[31] A. Maaradji, et al., Detecting sudden and gradual drifts in business processes
the Asset Institute for his support for the project and to the many
from execution traces, IEEE Trans. Knowl. Data Eng. 29 (10) (2017) 2140–2154.
team members at the case study company who generously gave [32] K.L. Mason, et al., Occupational fatalities during the oil and gas boom—United
their time and patience so that we could improve our understand- States, 2003–2013, Morb. Mort. Wkly. Rep. 64 (20) (2015) 551.
ing of their organisation and its safety systems. [33] I. Mohammadfam, et al., Evaluation of the quality of occupational health and
safety management systems based on key performance indicators in certified
organizations, Saf. Health Work 8 (2) (2017) 156–161.
References
[34] H. Nordlöf, et al., Safety culture and reasons for risk-taking at a large steel-
manufacturing company: investigating the worker perspective, Saf. Sci. 73
[1] R. Accorsi, T. Stocker, On the exploitation of process mining for security au- (2015) 126–135.
dits: the conformance checking case, in: Proceedings of the 27th Annual ACM [35] Q. Ouyang, et al., Methodologies principles and prospects of applying big data
Symposium on Applied Computing, ACM, 2012. in safety science research, Saf. Sci. 101 (2018) 60–71.
[2] M.F. Ballesteros, S.A. Sumner, R. Law, A. Wolkin, C. Jones, Advancing injury and [36] A.B. Parsa, et al., Real-time accident detection: coping with imbalanced data,
violence prevention through data science, J. Saf. Res. 73 (2020) 189–193. Accid. Anal. Prev. 129 (2019) 202–210.
[3] J. Bao, et al., Understanding the effects of trip patterns on spatially aggregated [37] R.K. Perrons, J.W. Jensen, Data as an asset: what the upstream oil & gas in-
crashes with large-scale taxi GPS data, Accid. Anal. Prev. 120 (2018) 281–294. dustry can learn about “Big Data” from companies like social media, in: SPE
[4] A. Baum, et al., Emotional, behavioral, and physiological effects of chronic stress Annual Technical Conference and Exhibition, 2014.
at Three Mile Island, J. Consult. Clin. Psychol. 51 (4) (1983) 565–572. [38] R.K. Perrons, J.W. Jensen, Data as an asset: what the oil and gas sector can
[5] A. Bianchini, et al., An innovative methodology for measuring the effective im- learn from other industries about “Big Data”, Energy Policy 81 (2015) 117–121.
plementation of an Occupational Health and Safety Management System in the [39] A. Pika, et al., Mining resource profiles from event logs, ACM Trans. Manag. Inf.
European Union, Saf. Sci. 92 (2017) 26–33. Syst. 8 (1) (2017) 1.
[6] C.E. Billings, W.D. Reynard, Human factors in aircraft incidents: results of a 7- [40] A. Pika, A.H. ter Hofstede, R.K. Perrons, G. Grossmann, M. Stumptner, J. Cooley,
year study, Aviat. Space Environ. Med. 55 (10) (1984) 960–965. Analysing an industrial safety process through process mining: a case study, in:
[7] M. Booth, J.D. Butler, A new approach to permit to work systems offshore, Saf. J. Mathew, C.W. Lim, L. Ma, D. Sands, M.E. Cholette, P. Borghesani (Eds.), Asset
Sci. 15 (4–6) (1992) 309–326. Intelligence Through Integration and Interoperability and Contemporary Vibra-
[8] J.C.R.P. Bose, et al., Handling concept drift in process mining, in: Advanced In- tion Engineering Technologies, Springer, Cham, Switzerland, 2019, pp. 491–500.
formation Systems Engineering, 2011. [41] A. Pika, et al., Evaluating and predicting overall process risk using event logs,
[9] F. Chen, S. Chen, X. Ma, Analysis of hourly crash likelihood using unbalanced Inf. Sci. 352 (2016) 98–120.
panel data mixed logit model and real-time driving environmental Big Data, J. [42] M. Pillay, M. Tuck, Permit-to-work systems as a health and safety risk control
Saf. Res. 65 (2018) 153–159. strategy in mining: a prospective study in resilience engineering, in: Interna-
[10] S. Clarke, I. Taylor, Reducing workplace accidents through the use of leadership tional Conference on Applied Human Factors and Ergonomics, Springer, 2017.
interventions: a quasi-experimental field study, Accid. Anal. Prev. 121 (2018) [43] D. Podgorski, Measuring operational performance of OSH management system
314–320. — a demonstration of AHP-based selection of leading key performance indica-
[11] C.K. Curlee, et al., Upstream onshore oil and gas fatalities: a review of OSHA’s tors, Saf. Sci. 73 (2015) 146–166.
database and strategic direction for reducing fatal incidents, in: SPE/EPA/DOE [44] A. Polyvyanyy, et al., A systematic approach for discovering causal dependen-
Exploration and Production Environmental Conference, 2005. cies between observations and incidents in the health and safety domain, Saf.
[12] M. De Leoni, et al., Data-and resource-aware conformance checking of busi- Sci. 118 (2019) 345–354.
ness processes, in: International Conference on Business Information Systems, [45] R.A.H.E. Rashidy, P. Hughes, M. Figueres-Esteban, C. Harrison, C. Van Gulijk, A
Springer, 2012. Big Data modeling approach with graph databases for SPAD risk, Saf. Sci. 110
[13] N. Dhakal, C.R. Cherry, Z. Ling, M. Azad, Using CyclePhilly data to assess wrong- (2018) 75–79.
way riding of cyclists in Philadelphia, J. Saf. Res. 67 (2018) 145–153. [46] L.S. Robson, et al., The effectiveness of occupational health and safety man-
[14] M. Dumas, et al., Fundamentals of Business Process Management, Springer, agement system interventions: a systematic review, Saf. Sci. 45 (3) (2007)
2013. 329–353.
[15] M. Duryan, et al., Knowledge transfer for occupational health and safety: cul- [47] G. Santos, et al., The main benefits associated with health and safety manage-
tivating health and safety learning culture in construction firms, Accid. Anal. ment systems certification in Portuguese small and medium enterprises post
Prev. 139 (2020) 105496. quality management system certification, Saf. Sci. 51 (1) (2013) 29–36.
[16] M. Favaretto, E. De Clercq, C.O. Schneble, B.S. Elger, What is your definition of [48] R.T. Shalini, Economic cost of occupational accidents: evidence from a small
Big Data? Researchers’ understanding of the phenomenon of the decade, PLoS island economy, Saf. Sci. 47 (7) (2009) 973–979.
ONE 15 (2) (2020) e0228987. [49] X. Shi, et al., A feature learning approach based on XGBoost for driving assess-
[17] R. Flin, et al., Measuring safety climate: identifying the common features, Saf. ment and risk prediction, Accid. Anal. Prev. 129 (2019) 170–179.
Sci. 34 (1–3) (2000) 177–192. [50] S. Sinelnikov, et al., Using leading indicators to measure occupational health
[18] Fortune, Fortune Global 500 - The List, Fortune (Asia-Pacific edition) 180 (2) and safety performance, Saf. Sci. 72 (2015) 240–248.
(2019) F1-F22. [51] M. Song, W.M.P. van der Aalst, Towards comprehensive support for organiza-
[19] C.V. Geambasu, BPMN vs. UML activity diagram for business process modeling, tional mining, Decis. Support Syst. 46 (1) (2008) 300–317.
Account. Manag. Inf. Syst. 11 (4) (2012) 637. [52] S. Suriadi, et al., Root cause analysis with enriched process logs, in: Interna-
[20] Y.M. Goh, et al., Dynamics of safety performance and culture: a group model tional Conference on Business Process Management, 2012.
building approach, Accid. Anal. Prev. 48 (2012) 118–125. [53] K.H. Tan, V.G. Ortiz-Gallardo, R.K. Perrons, Using big data to manage safety-
[21] C.W. Günther, A. Rozinat, Disco: discover your processes, Business Process related risk in the upstream oil & gas industry: a research agenda, Energy
Management (Demos), Citeseer. (2012). Explor. Exploit. 34 (2) (2016) 282–289.
[22] S. Guo, et al., A Big-Data-based platform of workers’ behavior: observations [54] W. van der Aalst, et al., Replaying history on process models for conformance
from the field, Accid. Anal. Prev. 93 (2016) 299–309. checking and performance analysis, Wiley Interdiscip. Rev. Data Min. Knowl.
[23] A. Hale, D. Borys, Working to rule, or working safely? Part 1: a state of the art Discov. 2 (2) (2012) 182–192.
review, Saf. Sci. 55 (2013) 207–221. [55] W.M.P. van der Aalst, Business process management demystified: a tutorial on
[24] I.A.T. Hashem, I. Yaqoob, N.B. Anuar, S. Mokhtar, A. Gani, S.U. Khan, The rise of models, systems and standards for workflow management, Advanced Course
“Big Data” on cloud computing: review and open research issues, Inf. Sci. 47 on Petri Nets (2003).
(2015) 98–115. [56] W.M.P. van der Aalst, Process Mining: Data Science in Action, Springer-Verlag,
[25] C. Hetherington, et al., Safety in shipping: the human element, J. Saf. Res. 37 (4) Berlin, 2016.
(2006) 401–411. [57] C.D. Veley, Applying a new HSE measurement system, in: SPE International
[26] E. Ilbahar, et al., A novel approach to risk assessment for occupational health Conference on Health, Safety and Environment in Oil and Gas Exploration and
and safety using Pythagorean fuzzy AHP & fuzzy inference system, Saf. Sci. 103 Production, 2002.
(2018) 124–136. [58] C. Vogel, S. Zwolinsky, C. Griffiths, M. Hobbs, E. Henderson, E. Wilkins, A Del-
[27] R.E. Iliffe, et al., More effective permit-to-work systems, Process Saf. Environ. phi study to build consensus on the definition and use of big data in obesity
Prot. 77 (2) (1999) 69–76. research, Int. J. Obes. 43 (12) (2019) 2573–2586.

13
A. Pika, A.H.M. ter Hofstede, R.K. Perrons et al. Big Data Research 25 (2021) 100210

[59] J.K. Wachter, P.L. Yorio, A system of safety management practices and worker [63] M.T. Wynn, et al., ProcessProfiler3D: a visualisation framework for log-based
engagement for reducing and preventing accidents: an empirical and theoreti- process performance comparison, Decis. Support Syst. 100 (2017) 93–108.
cal investigation, Accid. Anal. Prev. 68 (2014) 117–130. [64] P.L. Yorio, et al., Health and safety management systems through a multilevel
[60] B. Wang, C. Wu, Safety informatics as a new, promising and sustainable area of and strategic management perspective: theoretical and empirical considera-
safety science in the information age, J. Clean. Prod. 252 (2020) 119852. tions, Saf. Sci. 72 (2015) 221–228.
[61] B. Wang, C. Wu, L. Huang, L. Kang, Using data-driven safety decision-making [65] J. Zimmerman, et al., Process safety management best practice: safe work per-
to realize smart safety management in the era of big data: a theoretical mit management system, in: ASSE Professional Development Conference and
perspective on basic questions and their answers, J. Clean. Prod. 210 (2019) Exposition, 2017.
1595–1604.
[62] H.T. Wong, V.C.L. Chiang, K.S. Choi, A.Y. Loke, The need for a definition of Big
Data for nursing science: a case study of disaster preparedness, Int. J. Environ.
Res. Public Health 13 (10) (2016) 1015.

14

You might also like