You are on page 1of 18

Comput Sci Res Dev (2015) 30:69–86

DOI 10.1007/s00450-013-0247-3

R E G U L A R PA P E R

Business impact analysis—a framework for a comprehensive


analysis and optimization of business processes
Sylvia Radeschütz · Holger Schwarz ·
Florian Niedermann

Received: 27 September 2010 / Accepted: 7 August 2013 / Published online: 14 September 2013
© Springer-Verlag Berlin Heidelberg 2013

Abstract The ability to continuously adapt its business pro- Keywords Business process optimization · Data
cesses is a crucial ability for any company in order to sur- warehousing · Information integration · OLAP · Data
vive in today’s dynamic world. In order to accomplish this mining
task, a company needs to profoundly analyze all its business
data. This generates the need for data integration and analy-
sis techniques that allow for a comprehensive analysis. 1 Introduction
A particular challenge when conducting this analysis is
the integration of process data generated by workflow en-
Increasing competition and significantly shortened product
gines and operational data that is produced by business ap-
lifecycles led to a situation where fast adaption and con-
plications and stored in data warehouses. Typically, these
two types of data are not matched as their acquisition and tinuous optimization of business processes are critical fac-
analysis follows different principles, i.e., a process-oriented tors in determining the success of a company [42]. Business
view versus a view focusing on business objects. process optimization aims to improve processes of an or-
To address this challenge, we introduce a framework that ganization, e.g., by discovering and removing unnecessary
allows to improve business processes considering an inte- activities and by replacing activities by more efficient ones
grated view on process data and operational data. We present [17, 36]. For these optimizations, companies need to analyze
and evaluate various architectural options for the data ware- process data using analysis techniques such as process mon-
house that provides this integrated view based on a special- itoring and process mining [37]. Operational data of other
ized federation layer. This integrated view is also reflected in business applications, e.g., data from business transactions
a set of operators that we introduce. We show how these op- or from master data management, is stored in a data ware-
erators ease the definition of analysis queries and how they house and analyzed separately via OLAP (Online Analyti-
allow to extract hidden optimization patterns by using data cal Processing) and data mining. All these methods usually
mining techniques. fall short when it comes to questions requiring an integrated
view on both process data and operational data. As an ex-
ample, consider a car rental company that tries to optimize
its rental process. A highly relevant question to a business
analyst would be how trainings and work experience affect
the execution time as well as the success of the process. An-
S. Radeschütz · H. Schwarz (B) · F. Niedermann swering this question requires both process data (process ex-
IPVS, Universität Stuttgart, Universitätsstr. 38, 70569 Stuttgart,
Germany ecution data, paths taken) as well as operational data related
e-mail: holger.schwarz@ipvs.uni-stuttgart.de to the employees involved in process execution (work ex-
S. Radeschütz perience, trainings, demographics). In such a situation, an
e-mail: sylvia.radeschuetz@ipvs.uni-stuttgart.de integrated analysis would make a valuable contribution by
F. Niedermann ensuring that all relevant data is taken into account. We call
e-mail: florian.niedermann@ipvs.uni-stuttgart.de this approach Business Impact Analysis (BIA).
70 S. Radeschütz et al.

In this paper, we introduce a framework for Business Im-


pact Analysis that allows to improve business processes con-
sidering an integrated view on process data and operational
data. The main contributions are as follows:
– We discuss the structure of a data warehouse that builds
the basis for BIA. Our focus is on the evaluation of ar-
chitectural options that mainly differ in the way links bet-
ween process data and operational data are maintained.
– We introduce new operators, so-called BIA operators, that
extend the set of standard OLAP operators and take the
specific requirements of BIA into account.
– We show how to use data mining techniques for extracting
hidden optimization patterns from large audit trails and
operational data.
The remainder of this paper is organized as follows. We
give a brief overview of the BIA lifecycle in the following
section and summarize related work in Sect. 3. Section 4
shows a warehouse schema that efficiently realizes the in-
tegration of process data and operational data. In Sect. 5,
Fig. 1 Lifecycle for business impact analysis
we introduce new operators that allow to specify queries on
this integrated BIA warehouse in a user-friendly way. How
to combine these BIA operators with standard data mining
techniques in order to derive hidden patterns from the BIA
warehouse, is discussed in Sect. 6. We discuss and evalu-
ate the proposed warehouse architectures and operators in
Sect. 7 and finally summarize in Sect. 8.

2 Lifecycle for business impact analysis

In Fig. 1, we define the BIA lifecycle for improving busi-


ness processes, which are at the center of this lifecycle. Our
framework supports the necessary steps for an optimization
of these business processes in the phases of the lifecycle. We Fig. 2 Car rental process
illustrate these phases and the relevant system components
by a sample scenario which is introduced next. to one of the available roles. Thus, ContractNegotiation can
be claimed and executed by all agents from departments A,
2.1 Example scenario B or C. If the customer does not accept an alternative car
or renting period the process is canceled. Otherwise, the car
Our sample scenario consists of a BPEL [23] process (see is handed over to the customer by an employee of depart-
parts of it in Fig. 2) that is supposed to be optimized, in the ment D in human task CarHandOver. In Fig. 2, all pro-
sense that expensive, unnecessary, long running or canceled cess variables are marked by hash marks #. The operational
process parts should be identified, analyzed and revised. The data in the scenario includes useful data for optimization. It
process is part of a car rental service and describes the se- comprises information about customers, employees or cars
lection of a rental car. It receives its input data from acti- (shown here as tables Customer, Employee and Automobile).
vity GetCustomerData together with information about the
customer and his preferred car class and checks if the re- 2.2 Phases of the lifecycle
quested car is available by executing activity GetCarAvail-
ability. If no car is available during the desired rental period, Optimization at the process level is often termed business
an employee executes the human task ContractNegotiation process re-engineering [6, 8, 10, 15] or continuous process
to check if the customer would also accept another car class. improvement [21, 36]. We address optimization beyond this
The task is not directly assigned to a specific employee, but level by exploiting operational business data at the same
Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 71

level of importance. In order to be able to perform such services. One of the most prominent examples is Semantic
an extensive optimization, we propose a lifecycle for BIA Annotations for WSDL (SAWSDL) [39]. SAWSDL extends
that comprises four phases (see Fig. 1): During the match- WSDL components with an URL to a given ontology con-
ing phase, process model elements are combined with el- cept to enable their semantic annotation. Thus, we selected
ements in the operational data models. These matches are SAWSDL for the annotation of model elements for semi-
used in the ETL (Extraction-Transformation-Load) process- automatic matching. For BIA, we are primarily interested in
ing of the warehousing phase to cleanse and integrate pro- SAWSDL annotations for process variable definitions. Fur-
cess data and operational data into one BIA warehouse (BIA thermore, we extended the annotation capabilities to cover
WH). The analysis phase discovers additional optimization operational data as well, which leads to a set of annotated
knowledge by considering both kinds of data. This know- schema elements.
ledge provides the basis for the optimization phase, where
the process is restructured to achieve better business results. Matching In this step, the BIA-Matcher applies matching
We give an overview of the phases in the following subsec- rules to find matches between variable elements and ele-
tions. In this paper, we focus especially on the phases for ments of operational data models. For the semi-automatic
warehousing and analysis that are discussed in Sects. 4 to 6. matching, the BIA-Matcher uses a suitable reasoning tool to
infer logical consequences from semantically annotated data
2.2.1 Matching phase using the loaded ontologies. For this purpose, the BIA-Mat-
cher holds a selection of appropriate inference rules. We are
The goal for this phase is to find for all variable elements of especially interested in rules that discover synonyms, sub-
the given process the matching elements in operational data classes, equality or union relationships between the concepts
models. The matching phase traverses process and schema of two annotated match partners as well as other concept re-
models to determine matching schema elements for each lations defined by user-defined ontology rules.
process variable element and assigns similarity values bet- For non-annotated or partially annotated models, the
ween 0 and 1. There are different ways how our framework BIA-Matcher employs rules for automatic matching that
supports the matching: consider process features in addition to common schema
matching algorithms. In order to match process variables
– manual matching: Matches are found and stored manually with elements in operational data models the rules exploit,
by a business analyst. e.g., names of the elements or data dependencies via the data
– semi-automatic matching: Model elements are semanti- flow.
cally annotated and matched automatically. In both matching approaches, the Matcher considers the
– automatic matching: The matches are found purely auto- context of match partners, e.g., the names of process com-
matic. ponents working with the matched variable or the names of
The matching phase starts by loading all process mod- parents of operational elements. The context is used to refine
els and operational data models into the BIA-Matcher. It found matches and to improve the precision rate.
also loads process variables from their process source file In our sample scenario, the Matcher combines the fol-
or from audit trails of proprietary workflow engines where lowing elements: (1) We assume that the element Cust.ID of
process data is stored after process execution. Furthermore, variable inputData is combined with attribute CID of the op-
the BIA-Matcher loads elements of operational data models erational table Customer, (2) the executing roles (assignees)
from relational databases, XML databases or from CWM of ContractNegotiation with table Employee, (3) element
(Common Warehouse Metamodel) [25], which is a standard Car.Model of variable ServiceInfo with attribute Class of ta-
format to unify the interchange of proprietary formats in the ble Automobile.
warehousing process. Ontologies (Web Ontology Language
(OWL) [41] and Web Service Modeling Language (WSML) 2.2.2 Warehousing phase
[40]), that are used for semantic semi-automatic matching,
are loaded as well. An integrated data warehouse has to be established in order
An overview of the BIA-Matcher can be found in [27], to enable a Business Impact Analysis. The BIA-Loader as
and [28] demonstrates its functionalities. The main compo- shown in Fig. 1 supports the building of this warehouse and
nents for annotating model elements and the steps for semi- its ETL flow. Its components are only briefly described here,
automatic and automatic matchings are explained in the fol- we explain further details in Sect. 4.
lowing:
WH creation As we have to handle huge amounts of pro-
Annotation Within Semantic Web research a number of cess execution data, operational transaction data and master
standards have been established for the annotation of web data, we do not build one consolidated warehouse, but leave
72 S. Radeschütz et al.

Fig. 1 is used for analyzing the data and receiving know-


ledge about valuable correlations as basis for this optimiza-
tion. We give an overview here and discuss details in Sects. 5
and 6. All the analysis results must be interpreted by the user
and stored as BIA knowledge for the optimization phase.

Preselection We use the BIA-Analyzer to preselect at-


tributes in the BIA WH that might be valuable for analysis.
Usually, we have to consider many dimensions in the ware-
house and many operational dimensions as well. To reduce
dimensionality, the BIA-Analyzer applies various analysis
preparation patterns. We derive these patterns from statistics
Fig. 3 BIA warehouse
about analysis results of earlier data mining runs or OLAP
analyses. The process dimensions are preselected depending
all data in their source systems. In order to realize the in- on the impact they have on process executions, such as ex-
tegrated data warehouse, we create a federated warehouse ecution time or process state. Hence, process dimensions or
architecture (see Fig. 3). For the federation server, one may operational dimensions that are not needed for the analysis
use standard federation systems such as IBM Infosphere can be removed from the data mining input. Relevant pro-
Federation Server. The data sources of process data and op- cess attributes and their matching operational attributes are
erational data are accessible to this federated architecture via narrowed down by statistics from earlier analyses as well.
standard wrappers, depending on the database system of the
audit trail and the operational data storage. Both data can be OLAP To gain insight into problematic business processes,
used by the analyst via an SQL client. OLAP queries are used on integrated process and opera-
The federation layer is responsible for the integration and tional data. Our framework offers a number of specialized
contains the mappings resulting from the matches identified OLAP operators that can be used to simplify an integrated
in the matching phase between the process variables from analysis. Frequent types of subqueries are realized in these
the audit trail and the operational schema elements. These OLAP operators and make it easier to perform complex que-
mappings are the only integrable elements between process ries. They are introduced in Sect. 5.
data and operational data, because audit trails primarily store
Data mining The BIA-Analyzer can apply standard data
attributes that are relevant during the execution like the du-
mining techniques on the BIA WH in order to receive new
ration of the process which do not have a counter part in the
insights into correlated effects between processes and op-
operational data. In Sect. 4.2, we propose two variants of
erational data. Several data mining techniques and algo-
a federated warehouse architecture to store identified map-
rithms are adopted to integrated process and operational
pings: via one match table that contains all matches or via
data. OLAP-Operators can also be applied before data min-
one bridge table for each match between a process attribute
ing in order to prepare the data for a data mining algorithm.
and an operational attribute.
The analysis phase may reveal useful relationships bet-
ween the duration and result of a process and operational in-
ETL Standard ETL steps such as data extraction, transfor-
formation on the business objects or executing resources. In
mation, data cleaning and normalization are done to provide
our sample scenario in Fig. 2, we may discover in (1) a cor-
optimal input data for the BIA WH. For found matches, the
relation c1 between the assets of customers (type of credit
ETL flow integrates process and operational data sources. card) and processes with a long duration. The performance
As the user can choose from two different architectures for of ContractNegotiation in (2) may depend on the executing
the BIA WH, different transformation steps are necessary employee and his skills (c2). In (3), there may be a rela-
covering bridge table creations and match table creation. tionship c3 between the rented car class and the execution
time of CarHandOver. A detailed analysis shows that cer-
2.2.3 Analysis phase tain car class features require a longer introduction time for
the customers, e.g., using sports cars. We discuss the BIA
The goal of this phase is to provide a concise view on all as- data mining operations in detail in Sect. 6.
pects of the process being considered for process optimiza-
tion. For instance, it might be important to make available 2.2.4 Process optimization phase
information on activity execution time, waiting time, feasi-
bility of parallelization and relevant process attributes with The optimization phase is based on the analysis results
regard to the process outcome. The BIA-Analyzer shown in gained. The BIA-Optimizer (see Fig. 1) aims to use these
Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 73

results and to rewrite the business processes improving their [4] of these warehouses have been studied in previous work.
performance. To successfully rewrite and improve the pro- But all the mentioned approaches and techniques refer to
cess model, the BIA-Optimizer needs to detect the appro- the actual flow logic. Operational data sources with further
priate rewrite rules depending on the given business goals. information are typically neglected.
Hence, the BIA-Optimizer gets the optimization goal as well Operational data comprises all data processed within the
as any constraint that should be considered during the opti- business that is not stored in an audit trail, but by other sys-
mization in terms of cost, quality, utilized resources or busi- tems, e.g., ERP systems and data warehouses. OLAP and
ness rules. The engine identifies applicable optimization pat- data mining techniques are typically employed to this data
terns (see also [22]) to fulfill these business goals and de- in order to analyze it and reveal hidden pattern. OLAP sup-
cides which rewrite rule fits best and should be executed. ports users in interactively analyzing multidimensional data
The rewrite rules are based on “best practice” techniques from multiple perspectives [7]. Data mining is the process of
for the optimization of processes [8]. The practices are for- searching large volumes of data for patterns using methods
malized in these rules to achieve a high extent of automated such as classification, clustering and association rule dis-
optimization. In the last step, the BIA-Optimizer automati- covery [2, 11, 29]. However, they do not consider the chal-
cally rewrites the business process according to the chosen lenges of integrating process data and operational data and
rule and allows the user to add process modifications manu- performing analyses on such a huge amount of dimensions
ally. resulting from this integration. The BIA operators that we
Using the analysis results of the correlations c1, c2 and introduce in this paper, offer an efficient way to deal with
c3 in our sample scenario, the car rental process is restruc- this integration and high dimensionality by preselecting rel-
tured in the optimization phase. In order to win wealthy cus- evant dimensions.
tomers and avoid processes with long duration in c1, they Some work has been done that tries to provide a more
are routed to special services. Analysis results for c2 let us global view on process data and operational data. In [34], the
try to raise the performance of ContractNegotiation and in- authors introduce an evaluation framework for process ware-
crease the number of accepted tasks. Therefore, a reorgani- houses. They define various perspectives a process ware-
zation of the execution roles is done. In c3, the longer intro- house should cover. Considering operational data in addi-
duction time in CarHandOver is managed. In our scenario tion to process data can be seen as part of the informa-
this activity is efficiently executed by outsourcing it during tional perspective. But neither options for the structure of an
rush hours for certain car classes. integrated warehouse nor operators to support the analysis
based on such a warehouse are discussed in [34]. The Pro-
cess Data Warehouse in [4] provides a warehouse model for
3 Related work a global analysis. However, in contrast to our BIA WH it fo-
cuses on process dimensions and the operational dimension
Only little work has been done in the area of a global anal- is not well-defined, but mixed with the process dimensions.
ysis of both workflow data and operational data. Hence, The PISA tool [44] considers process variables and only
related work also covers various approaches to derive operational data that is directly stored in these variables.
knowledge from process data or operational data in isola- No further attributes in operational dimensions are consid-
tion. ered. Furthermore, it offers only relatively simple analyses.
Pure process analysis is based on audit trails that store the None of the mentioned approaches support global data mi-
execution data of processes. Audit trails can be exploited ning techniques or OLAP operators as considered in BIA. In
in various ways. First, they are needed for business activ- those systems the business analysts have to guess about op-
ity monitoring (BAM) [14, 19, 43] to react to problems that erational relationships to certain process data to be analyzed.
arise during process enactment. Secondly, they are used as Our operators provide support for a combined analysis that
one basis for business process management systems that considers links between operational data and process data.
support the definition, execution, and tracking of business This enables analysts to make in-depth analyses more effec-
processes [3, 5, 32, 33]. A third example are process min- tive and more efficient.
ing techniques [1, 31, 36, 37, 45] that try to identify pro- Furthermore, our framework enables an overall business
cess models, check the conformance of process execution process optimization, that considers the main performance
with existing process models or aim at process optimiza- indicators time, cost and quality described in [6, 8]. In the
tion. They are based on audit trails as well. Data from au- analysis and optimization phases we aim to find options for
dit trails is often integrated in a data warehouse to be better improving the given process considering these indicators.
suited for analysis purposes. Such a data warehouse is of- However, our approach even goes deeper and aims to find
ten called a process warehouse or an audit warehouse. The hints for optimization not only in workflow data, but consid-
appropriate structure [9, 18] as well as the ETL processes ers also related operational data.
74 S. Radeschütz et al.

4 BIA warehouse architecture

An integrated warehouse of both process data and opera-


tional data is the basis for BIA. Two different warehouse ar-
chitectures are presented here and we discuss how they suit
the BIA purpose. At the core of both warehouse systems are
various analysis cubes each with one fact table. The design
of the fact tables and their depending dimensions are based
on metrics that are interesting to analyze for BIA. Thus, we
first summarize these metrics in this section before we ex-
plain the architectures.

4.1 Analysis metrics


Fig. 4 Conceptual view of a BIA cube in the BIA WH
The analysis metrics for BIA can be classified along the fol-
lowing categories which are mainly based on [4]:
– Process Metrics: These metrics are based on process data,
e.g., the duration between activation and completion of
activities or time intervals between the completion of a
task and start of another one.
– Resource Metrics: These metrics consider data related to
human and automated resources, e.g., their performance
in executing tasks.
– Business Object Metrics: They comprise all business data
values that are used in the workflow.
– Operational Metrics: They consider operational data val-
ues that further describe certain process attributes.
For a comprehensive analysis, metrics from all these cat-
egories have to be considered together. Hence, they should
be stored in an integrated warehouse architecture which is Fig. 5 Detailed conceptual view of a BIA cube
introduced in the next section.

4.2 Conceptual view by none, one or more operational sub-dimensions. These


correlations are depicted only as gray arrows in Fig. 4, be-
Both operational data sources and process data sources con- cause they are different for each activity in a process model.
tain huge amounts of operational data or process execu- The operational target tables and the number of correlations
tion data. For an integrated analysis, we propose a feder- to these target tables change for each matching process at-
ated warehouse architecture, as already shown in Fig. 3. The tribute.
warehouse stores BIA cubes that consist of facts being cat- One detailed conceptual view of a sample cube is de-
egorized by dimensions. Figure 4 shows the general con- picted in Fig. 5. In this paper, we focus on facts about ac-
ceptual view of such a cube. The cube is very general as tivity execution as analysis metrics. The activity execution
it should be applicable to different situations, i.e., to differ- facts are described by four process dimensions: workflow,
ent business processes and operational data models, and it time, business object and resource. The workflow dimen-
should allow to analyze different metrics. The figure shows sion stores the data about an activity, its activity instance ID
the most significant elements. The metrics in the fact table (ActivityInst), process ID and further process-specific and
pfact consist of the analysis metrics discussed before. The activity-specific details, e.g., its name, deployed version, etc.
facts are described by the process dimensions px , py , pn , In the time dimension the start time of an executed activ-
pm , etc. The dimensions contain process execution data and ity is stored and expanded in smaller time units as days,
data of the used process models. months, years and so on. The resource dimension stores
We add operational sub-dimensions on to om in the BIA workflow variables that contain information about employ-
WH. They enrich the process dimensions pn to pm by com- ees, machines or engines that executed an activity. In the
plementary information from other applications in the com- business object dimension the workflow variables and their
pany. Each attribute in a process dimension can be described elements are listed that are worked on in the activity execu-
Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 75

Table 1 Comparison of BIA WH architectures

Match table Bridge table

Integration on schema level Integration on instance level


No schema adaption, but only Schema adaption with new bridge
insertion of new match table for new match
One row per match One table per match
Cleansing during analysis phase Cleansing during ETL

tion. We refer to them by artificial business object identifiers


BOID and BOIDelem.
Both business objects and resources can be further de-
scribed by attributes from operational sub-dimensions. Fig-
ure 5 shows three operational tables: Customer, Automobile
and Employee. They contain attributes like trainings of em-
ployees or information about automobile classes. The cor-
relations between the operational attributes and the process
dimensions have been found in the matching phase. So the Fig. 6 Integrated warehouse with match table
business object elements of an activity may for example ref-
erence sub-dimension tables Customer or Automobile. Addi-
tionally, a resource may reference sub-dimension table Em-
ployee. We are only able to match the workflow variables
stored in the business object dimensions and resource di-
mensions with operational attributes in the BIA Cubes. For
every activity in a process model and its variables, different
operational sub-dimensions may be added to the BIA Cube
and then be used for analysis. As every variable is matched
to another operational attribute with another data type this
integration is not straightforward. In the federated ware-
house these matches are stored in the federation layer. In
the following, we present two ways to represent the matches
between process data and operational data in the federation
layer. In both approaches mapping names have to be created
to access process and operational tables. A brief overview
of the main characteristics of these approaches is shown in
Table 1.

4.2.1 Integration by a match table Fig. 7 Integrated warehouse with bridge tables

Figure 6 shows one way to model the federation layer in the


BIA WH. It describes the matches on the schema level. The mapping names of the belonging table, database schema and
architecture uses only one match table to connect both pro- database server name. To access the source tables, we use
cess and operational dimensions. This match table results identifiers called nicknames in IBM Infosphere.
from the matching phase and is stored in the federation sys-
tem. The match table includes a precise definition of corre- 4.2.2 Integration by bridge tables
lations between the dimensions. It stores the mapping be-
tween a process attribute of a resource or a business object Another way to integrate process data and operational data
element and the operational attribute that describe the same is shown in Fig. 7. This architecture dissolves the match ta-
real world item. The element in the process variable is iden- ble and creates a new bridge table for each match. Thus,
tified by its name and by the identifier of its activity and the integration happens on the instance data level. For creat-
its process. The matching operational attribute of a business ing one bridge table, we proceed as follows: For each match
object or a resource is described in the last two columns of listed in the match table, we get all tuples from the business
the match table. They store the name of the column and the object element table referring to variable elements with the
76 S. Radeschütz et al.

same name (VarElemName). Then we take only the tuples 5 Operators for a business impact analysis
that refer to the same activityID and processID as described
in the match table. This is done by joining the fact table and In order to improve business processes on basis of our BIA
the workflow table with the business object and business ob- WH, it is helpful to have OLAP support that goes beyond
ject element table. From the received tuples only those are the usual OLAP SQL features such as ROLLUP or WIN-
taken that have the same value in the business object ele- DOW [12]. As introduced in the section before, the oper-
ment table and the matched operational table. The same is ational data dimension is divided into sub-dimensions. In
done for the resource matches. The names of the bridge ta- order to handle these sub-dimensions efficiently in OLAP
bles are composed with the related element name so that the and data mining analyses, we propose new operators. An-
BIA Operators will find the appropriate bridge tables for a other goal of our operators is to facilitate frequently needed
certain element in the later OLAP analysis (see Sect. 5.2.1). queries on processes as a kind of macro, e.g., an operator
According to this architecture, a new table BridgeTable_ that looks for the activities with the longest duration or for
CustID is shown in Fig. 7 for the match between the element activities with errors. The operators help to analyze the three
CustID of variable inputData and column CID of table Cus- main performance indicators: cost, time and quality of a pro-
tomer (see Fig. 2). Another bridge table BridgeTable_Car- cess (see [8]). Another purpose of the operators is to prepare
Model addresses the match between element CarModel of the warehouse dimensions for data mining algorithms, e.g.
variable ServiceInfo and column class of table Automo- returning only numeric attributes for regression mining. Be-
bile. As class is no primary key in table Automobile, Brid- fore we describe the BIA operators in detail, we introduce a
geTable_CarModel needs the operational attribute AID as sample cube in the next section. The operators use this cube
foreign key to clearly reference all tuples with this class in to exemplify their analysis results.
table Automobile. For BridgeTable_AssigneeID the same is 5.1 Sample BIA cube
done to bridge resourceID of the resource table and EID of
the employee table. The attributes in all bridge tables are The sample cube that we use to illustrate the BIA opera-
nicknames to identify the source attributes. Finally, standard tors is based on the sample scenario and the process model
cleansing steps are required for the instance data to receive shown in Fig. 2. It combines process data of this car rental
correct matches, if the matched attributes do not have the process with associated operational data and is stored in a
same values. Examples with instance data for this BIA WH data warehouse with bridge tables.
architecture can be seen in Sect. 5.1 where the analysis op- Figure 8 shows a fragment of this cube filled with sam-
erators are explained. ple data. The fact table includes all activity executions,

Fig. 8 Sample BIA-cube with bridge tables


Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 77

e.g., of ContractNegotiation. The variables and their de- restrict the output table according to different format condi-
tails are contained in table businessobject and table busi- tions. They are especially interesting to provide the appro-
nessobjectelement. The resource table with the assignees priate input for certain data mining tasks.
is not shown here. BridgeTable_CarModel bridges the el-
ement Car.Model of the activity ContractNegotiation in the BIASUBALL The BIASUBALL operator has three input
CarSelectionProcess with column class of the operational parameters: var, processid and activityid (Fig. 9(e)).
table Automobile, whereas BridgeTable_AssigneeID bridges BIASUBALL selects all columns and their values of the
the resource table with the operational employee table. The operational tables that contain matched sub-dimensional at-
operational tables Training and CarFeatures are related to tributes for the specified variable elements in the specified
Employee and Automobile via foreign keys. activity and process. The variable elements var are defined
by distinct element names, processid contains the identi-
5.2 BIA operators fier of the requested process and activityid the identifier of
the requested activity which processes the given variables.
All BIA operators are defined by the syntax snippets in When no element name of a variable is specified, correla-
Fig. 9 in Backus-Naur form. As defined by the syntax snip- tions to all elements of the given activity are emitted together
pets, all operators are a component of the table reference with the attributes in the tables where they appear.
used in the from clause of a table expression (see Fig. 9(a) The operator proceeds as follows: It selects those tuples
and (b)). The bia clause in is an alternative table reference from the process dimensions that contain the given process
that again consists of six alternative clauses representing the ID, activity ID and variables and all attributes from related
BIA operators. The syntax is the same in both architectures, operational tables as well as attributes from operational ta-
but the operators differ in their execution. The main differ- bles referencing the matched operational attribute by foreign
ence is that the operators executed on a BIA WH with match keys. The related operational attributes are gained via ex-
table have to perform their cleansing steps during the execu- ploiting either the match table or the meta data of the ware-
tion. Hence, these cleansing steps have to be executable au- house to look for bridge tables named after the correspond-
tomatically. On a BIA WH with bridge tables, this cleansing ing variable element. The output table contains the activity
is part of the ETL processing. instance and all related operational attributes. Some of them
The usage of the BIA operators requires an allocation of may be used by standard data mining techniques for deriving
several attributes and tables of the process dimensions with new optimization hints.
strictly defined naming and foreign key relations. The op-
erators work on a schema as shown in Fig. 5 with nick- Example In the following, the operator BIASUBALL is il-
lustrated using the example tables from Fig. 8. The query al-
names to identify the tables named workflow, time, activ-
lows to figure out which attributes of the resource assignee
ityexecution, businessobject, businessobjectelement and re-
determine a successful execution of the activity ContractNe-
source. At least the attributes processid, activityid and activ-
gotiation in the CarSelectionProcess:
itytype amongst others in the workflow table must have nick-
names. Furthermore, there should be nicknames for the at- SELECT BIA.*, BOE.VAR E LEM NAME , BOE.VARVALUE
tribute Name in the businessobject table, for VarElemName FROM biasuball( TASKVAR . ASSIGNEEID , C AR S E -
and VarValue in businessobjectelement, for starttime in the LECTION P ROCESS , C ONTRACT N EGOTIATION ) AS BIA ,
time table and for all attributes in the fact table. BUSINESSOBJECT BO , BUSINESSOBJECT-
ELEMENT BOE , ACTIVITYEXECUTION A
5.2.1 Operators for sub-dimension analysis WHERE BIA . ACTIVITY I NST = A . ACTIVITY I NST
AND BO . BOID = A . BOID
A first group of operators can be used to get a first overview AND BO . BOID = BOE . BOID
of the related operational attributes for a given activity. Ac- AND BO . NAME = ‘ TASKVAR ’
cording to the used BIA WH they evaluate the match ta- AND BOE . VARELEMNAME = ‘ OUTCOME ’
ble (see Fig. 6) or the bridge tables (see Fig. 7) to find all
joins between the operational dimensions and the business We select all related operational attributes for this as-
object or the resource dimension. The operators are shown signee by BIASUBALL and join them with the tables activi-
as alternative clauses for the biasub clause in Fig. 9(d). We tyexecution and businessobject(element) to get the outcomes
define the operator BIASUBALL that simply returns all re- of the activity as well. When this query is executed on our
lated sub-dimensional attributes. Hence, the output columns sample BIA-Cube we receive a result table of which an ex-
of the result table are dependent on which process and ac- tract is shown in Fig. 10. This result shows that the majority
tivity models the operator is applied. Furthermore, we de- of cases with Outcome = ‘Reject’ is related to the agent’s
fine two operators BIASUBLABEL and BIASUBNUM to trainings and is done by employees that are trained for sales.
78 S. Radeschütz et al.

Here we can draw this conclusion simply by looking at the a list where all possible nominal attributes are listed (string,
result table, but usually further analyses are necessary, e.g. character, . . . ).
by means of data mining techniques. For an optimization, The second operator BIASUBNUM (Fig. 9(g)) returns all
these findings need further investigation. numeric attributes by checking a similar list (integer, float,
decimal, numeric, . . . ). This is necessary for data mining ap-
BIASUBNUM and BIASUBLABEL As some data mining proaches such as clustering or regression where continuous
approaches need restricted categories of input data, we deve- data values are needed. The syntax of the operator also re-
loped two additional operators. The BIASUBLABEL opera- sembles the BIASUBALL operator.
tor (Fig. 9(f)) filters all attributes that have nominal val-
ues. This operator is especially necessary for data mining Example If we use BIASUBNUM instead of BIASUBALL
approaches such as classification where all attributes are as in the previous example, the result table looks similar
grouped by their nominal data categories. If a transformation to Fig. 10. However, this time we receive only the numeric
of numeric values to such categories is not possible or not columns. The categorical attributes are eliminated from the
wanted, this operator is very important. Its syntax is equal operational output table. So the operational columns of em-
and its output table is similar to the BIASUBALL output, ployee (Name, Position, Tname and Tdate) are excluded, as
but only nominal attributes are emitted. The operator checks they have non-numeric data types. All other columns are

Fig. 9 SQL Syntax of BIA operators

Fig. 10 Sample result table of


BIASUBALL
Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 79

Fig. 11 Sample result table of


BIAHT

shown, as they are numeric (Salary) or belong to joined pro- BIAFLUCT High duration variances for the same activ-
cess or bridge tables. BIASUBLABEL would operate anal- ity in different process executions give also hints for pro-
ogously returning the nominal attributes instead. cess optimizations. The BIAFLUCT operator aims to find
these activities. The operator searches for all activities for
5.2.2 Operators for duration analysis a given process similar to the previous two operators. How-
ever, it searches for tuples with activities whose execution
As the duration of business processes plays a major role for length fluctuates in the given duration value. That means
process optimization, our framework offers three operators it searches if the amount of percent executions fluctuates
to support the duration analysis: BIAHT, BIAPL and BIA- (i.e. is slower or faster) in the indicated duration than
FLUCT. Their syntax is depicted in Figs. 9(h), (i) and (j). the average time of the other instances of the same activ-
ity model. The duration value is expected in the following
BIAHT and BIAPL The human task activity and, more
format 000:00:00:000 (days:hours:minutes:seconds).
general, the invoke activity waiting for response of partner
services are a source for optimization, because they often
Example In the following, we further investigate all activi-
last very long. Our operators find these activity types in the
ties in the CarSelectionProcess. We create an example query
given process executions and return the according tuples to-
using BIAFLUCT to search all activities of which 20 % last
gether with their variables and their duration length. The BI-
more than 30 minutes longer or are more than 30 minutes
AHT operator searches for all human task activities in the
fact table for the given process identified by processid. It faster than the average time of the according activity exe-
joins the fact table with the workflow table (to select all ap- cution. We select the activity executions together with the
propriate activities) and with the time table (to get the dura- information on used variables:
tion of the activity executions). The workflow and the time SELECT *
table are not shown in Fig. 8, but only indicated in Fig. 5. FROM BIAFLUCT(C AR S ELECTION P ROCESS,
The approach of the BIAPL operator is just the same, but it 000:00:30:000, 20)
searches for general invoke activities to partner links instead
of only human tasks. The output table is similar to Fig. 11. It shows the same
attributes, but only for those activity instances where at least
Example In the following example query we use BIAHT to 20 % of the instances deviate more than 30 minutes from the
investigate the processing time of human task activities used average execution time for this activity. Furthermore it may
in the CarSelectionProcess: contain also other activity instances, e.g., activity GetCar-
SELECT * FROM BIAHT (C AR S ELECTION P ROCESS ) Availability with different variable elements, as it shows all
activity types. We also get ContractNegotiation instances as
We receive the executed activities and their durations to- result. If we use BIASUBALL operator, the reason for the
gether with the business object elements and the ResourceID negotiation delay for certain rental car models, e.g. AudiTT,
of responsible employees. Using our sample BIA-Cube this may be discovered: A high choice of extras in some automo-
query results in an output table as shown in Fig. 11 with the biles needs a lot of callbacks to the customer. For a process
duration calculated by using the start time of the activities optimization, we could change the process model and add
in the time table, with the ResourceID of the resource ta- an extra activity at the beginning that requests for additional
ble and the CarModel, Location and CustID attributes of the information from the customers that are interested in certain
businessobjectelement table. We discover that process in- car models.
stances in ContractNegotiation for InputData.Car.Model =
‘AudiTT’, are slower in their processing time than other car 5.2.3 Operators for exception analysis
models and need often more than 30 minutes. Using the next
BIA operator, we may take a closer look at these duration One important optimization goal is the prevention of errors
fluctuations. during process execution. Our framework offers two oper-
80 S. Radeschütz et al.

Fig. 12 Sample result table of


BIAERROR

ators for the analysis of activated exceptions during execu-


tion: BIAERROR and BIAEXCEPT.

BIAERROR The first operator BIAERROR (k) compares


faulted and completed process executions and returns tu-
ples of critical activity executions together with their re-
lated business objects. BIAERROR searches for all activi-
ties in the fact table for the given process. It returns a table
with activities which are not terminated in a certain fraction
(percent) of executions. While joining all relevant tables,
the state of the processes is evaluated. The output table con-
tains their related variable data and possible resources.

Example In the following, we further investigate the activ- Fig. 13 BIA-clustering example
ities in the CarSelectionProcess. Our example query uses
BIAERROR to search for all activities which end in a faulty
state for more than 30 percent of the executions. We select terns from the audit trails and operational data. We showed
these activity executions together with their used variable in [26] how common data mining techniques such as cluster-
data: ing, classification, association rule discovery and regression
(all specified in detail e.g. in [11]) are applied to a combi-
SELECT * nation of process dimensions and operational dimensions in
FROM BIAERROR(C AR S ELECTION P ROCESS , 30) the BIA approach. In this section, we show how to use the
BIA-Operators to preprocess data in the warehouse in order
We discover in the output table shown in Fig. 12, that
to simplify the application of these data mining techniques.
process instances of activity CarHandOver are faulty very
The examples are based on the car rental scenario again.
often. For a process optimization we need further analyses
and have to check the related operational attributes of the
6.1 Clustering
found activity variables and of their preceding activity vari-
ables.
Grouping a set of related objects into certain classes of ob-
BIAEXCEPT The second operator BIAEXCEPT (l) takes jects that are similar to each other and dissimilar to other
a closer look at the exception handlers that are used to catch classes is called clustering [13]. Especially high-dimension
errors that might have come up during process execution. clustering methods [24] are interesting for BIA because of
So this operator evaluates the execution data not for faulty the high number of sub-dimensions that result from the high
results, but for errors that are not so evident, because the number of process variables in different process activities.
process state is completed. Although the errors have been In order to prepare the data set for clustering, we may
caught, they still signal some kind of problem that might use the BIAFLUCT and BIASUBNUM operator. In a first
be worth a closer look. The operator compares regular and step, BIAFLUCT is used to receive all services that have
caught process executions, i.e., if a exception handler is exe- big fluctuations in the CarSelectionProcess. We may dis-
cuted (this is stored in the audit log), and returns the activity cover that activity GetCarAvailability appears often. For the
instances together with related variable data and resources. clustering, we only can use numeric attributes. Therefore,
BIASUBNUM is applied for GetCarAvailability and all of
its variables. We are interested especially in applying BI-
6 Data mining techniques for business impact analysis ASUBNUM to the CustID variable in order to be able to
cluster the activities in relation to their sub-dimensional cus-
As an additional option in the analysis phase, we use data tomer attributes. Figure 13 illustrates the clustering along
mining techniques for extracting hidden optimization pat- two dimensions, i.e., the execution time of activity instances
Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 81

Fig. 15 BIA-regression example

Fig. 14 BIA-classification example


We can use this technique to restructure the process mo-
del in order to avoid terminated paths. For process optimiza-
of GetCarAvailability and attribute customer ranking of sub-
tion, the employee group should be changed: Only agents
dimension customer. We may gain three clusters that we la-
that are skilled in communication or mechanics skilled in
bel as: VIPService, FastService, DelayedVIPService.
sales should be allowed to execute the task instead of all
BIA-Clustering is suited as a first analysis method to
employees from Dept A, B and C. For another decision tree
identify the activities that have problems and that must be
example, we could use BIAEXCEPT to classify certain ac-
further analyzed for optimization. The VIPProblemService
tivity executions according to the exceptions that have to be
cluster has to be examined in detail to discover reasons for
handled or if they have been executed without error.
the delay and to reorganize the processes.
6.3 Regression
6.2 Classification
Regression is closely related to classification. While classifi-
Classification is a data analysis method that can be used to cation has discrete results, regression methods predict conti-
extract patterns and to describe their related information in nuous-valued functions, e.g., by statistical methods [35].
categories [20]. In BIA, classification is very helpful for ca- For a flexible allocation of employees to human tasks
tegorizing processes. in the CarSelectionProcess, their duration should be pre-
In our example, we want to classify executed activities dicted to reduce bottlenecks and long delays. A preprocess-
according to their execution state. Thus, data of the CarSe- ing of data with the operators BIAHT and BIASUBNUM
lectionProcess is first preprocessed using BIAERROR. Dis- may be done for our sample regression mining. First, BI-
covering that ContractNegotiation is one error-prone activ- AHT finds out that, among others, activity CarHandOver
ity, which often leads to process termination, we continue often has a high runtime. We guess that the CarModel el-
with the operator BIASUBLABEL for this activity. The op- ement needs further surveillance. The numeric operational
erator is executed on the Assignee variable and receives all sub-dimensions for this element are received via BIASUB-
sub-dimensional categorical attributes. Now, classification NUM. By standard regression methods on its result table,
may start on this ContractNegotiation execution data to- we examine the duration of the activities of handing over a
gether with the received attributes. Figure 14 shows a deci- car to a customer. The regression may show that the dura-
sion tree for ContractNegotiation and its class label attribute tion depends on the amount of extra features available in the
process state, which has two distinct values complete and rented car model (Fig. 15).
faulty. The partition of the tuples depends on process data
and on operational data related to the employee who exe- 6.4 Association rule mining
cutes the task. The attribute training has the highest informa-
tion gain and becomes the first splitting attribute. If the em- BIA-Association Rule Mining is based on a mix of multidi-
ployee is trained in communication the activity is accepted mensional association rule mining and process mining. Mul-
at once. For sales training there may be revealed an addi- tidimensional association rule mining is a popular method
tional differentiation on the position attribute (employed as for discovering interesting relationships between different
rental car supplier in the agency or as rental car mechanic). kinds of attributes in large databases [16], whereas process
On the advertising side, we have a second differentiation on mining considers the process structure and tries to found out
priority for the execution of the activity. which activities with which variables occur together in one
82 S. Radeschütz et al.

Table 2 Evaluation overview

Match table Bridge table

Complexity of ETL + −
Data volume + −
Runtime of arbitrary SQL query − +
(Query type I)
Runtime of query with operators 0 0
(Query type II)
Usability 0 0

Fig. 16 BIA-association rule mining example


ation results. Finally, we discuss the usage of BIA operators
for OLAP and data mining analysis in Sect. 7.4.
process [38]. Attributes that occur frequently together with
certain values in transactions are called frequent itemsets. 7.1 Data volume and complexity of ETL processing
The task of this data mining method is to find all frequent
itemsets and use them to create association rules. The as- One major advantage of the warehouse with a match table
sociation rule {A = a, B = b} ⇒ {C = c} indicates that if A is that it comprises only one new table, namely the match
and B appear together with value a and b, also C with value table. This has two aspects. First of all, the ETL process
c is likely to appear in this transaction. A, B, and C can be is very simple, because it only copies the matches received
both process attributes and operational attributes. in the matching phase into the federation layer. No schema
Association rules that consider process variables, their adaption is necessary. Secondly, the BIA WH needs only ex-
values and their effects on later activities are necessary in tra space that depends on the number of found matches (n)
BIA for an efficient resource planning. Operational data can as each tuple in the match table contains one match between
help to categorize these effects. In a first step, we may an- the elements of the models. This space can be indicated by
alyze the activity GetCarAvailability and subsequent activ- O(n). The match between related instance data is not stored,
ities in the process. Data mining might reveal rules like in but received at query runtime.
Fig. 16(1) + (2) that are not valuable, because their support In contrast, the warehouse with bridge tables requires
might be too low. If we use BIASUBALL, we get more in- to extract and load all matches on instance level into the
formation from the operational data about the element Car- bridge tables. A new bridge table is created for each match
Model. Thus, we can group the found executions by the op- that connects process and operational tables causing signif-
erational attribute category of the element CarModel, e.g., icant space requirements. Due to the huge operational data
sports or family, and receive the necessary support for new stores (contain all executed operations) and huge audit trails
association rules like shown in Fig. 16(3) + (4). We might (contain all executed processes) the space requirements in
use them to reimplement the process to guarantee a fast pro- the federation layer are enormous. To give an idea of the
cessing. needed space, we show an estimation of the required num-
ber of bridge tables. We assume that in p process models,
on average v variables are handled by a activities and each
7 Discussion and evaluation variable consists of several elements e. On average each el-
ement matches o operational columns. Hence, the number
In this section, we discuss the two warehouse architectures of bridge tables can be indicated by BT = p ∗ v ∗ a ∗ e ∗ o.
introduced in Sect. 4 as well as the BIA operators introduced We also have to consider the number of tuples stored in the
in Sect. 5 and their usage for data mining as described in bridge tables. This number depends on the number of pro-
Sect. 6. As both warehouse architectures (Fig. 6 + Fig. 7) cess instances i and on the number of related operational
show advantages and disadvantages one has to decide which tuples ot for each matched variable element. So we receive
one will fit best to specific requirements. This section gives a total space of SBT = p ∗ v ∗ a ∗ e ∗ i ∗ o ∗ ot for the tuples
an evaluation of both warehouse schemes and the criteria for that are stored in the bridge tables in the federation layer.
this decision. In particular, we review how they influence the This considerable storage consumption may be unneces-
data volume and complexity of ETL processing (Sect. 7.1), sary, because many correlations and thus many of the tables
we present experimental results that show the query perfor- are not needed for analysis, as long as we do not plan to op-
mance on both warehouse architectures for queries with and timize the corresponding activities. Another disadvantage is
without BIA operators (Sect. 7.2) and discuss usability as- the required adaption of the warehouse schema during ETL.
pects (Sect. 7.3). Table 2 shows an overview of these evalu- Instead of only loading new matches into the match table,
Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 83

Table 3 Evaluation of BIA WH architectures For query type I, the schema of all warehouse tables has
to be well-known. In the warehouse with bridge tables, we
only need two joins between table businessobjectelement,
a bridge table and an operational table to receive all tuples
for one variable element and all attributes in the operational
table of its related operational element. For the example il-
lustrated in Fig. 7, the following query links car class in-
formation to process variables using tables Automobile and
BridgeTable_CarModel. For simplicity, all nicknames to the
source attributes are named just as the original attributes:

SELECT A . AID , A . CLASS , A . COLOR , A . PLATE


FROM BUSINESSOBJECTELEMENT E NATURAL JOIN
B RIDGE TABLE _C AR M ODEL NATURAL JOIN
AUTOMOBILE A
each time a new process model has been executed also new
bridge tables have to be created for the warehouse. Thus, In the architecture with the match table, queries of query
not only the warehouse but also the ETL process has to be type I additionally join the tables businessobject, workflow
adapted dynamically. Despite these disadvantages that the and activityexecution to receive the matching instance data
warehouse with bridge tables holds compared to the match (cf. Figs. 5 and 6):
table (Table 2, Complexity of ETL and Data volume), the
SELECT A . AID , A . CLASS , A . COLOR , A . PLATE
next section shows how one may benefit from it during anal-
FROM BUSINESSOBJECTELEMENT E NATURAL JOIN
ysis. BUSINESSOBJECT B NATURAL JOIN
ACTIVITYEXECUTION F NATURAL JOIN
7.2 Experimental evaluation of query performance WORKFLOW P NATURAL JOIN
AUTOMOBILE A
Our experimental setting consists of the federation layer WHERE A . CLASS = E . VALUE AND
based on IBM Infosphere Federation Server version 9.7 with E . ELEMENTNAME = ‘ CAR . MODEL’ AND
buffer size 30 KB to avoid loading all data into the buffer P. ACTIVITYID = ‘C AR H AND OVER ’ AND
pool. It runs on a Windows XP system with two 2.4 GHz P. PROCESSID = ‘C AR S ELECTION P ROCESS ’ AND
processors and 4 GB of main memory. We independently B . NAME = ‘S ERVICE I NFO ’
varied the data volume in the tables storing the process in-
As our goal is to compare both BIA WH architectures and
stances and the operational tuples, ranging from 10k up
to 10000k (k = 1000). The match table was provided in show how their different mappings influence query runtime,
constant size of 100 matches. In the bridge table solution, we only show the ratio between the query runtime on both
this resulted in 100 bridge tables with a varying number of architectures in Table 3 instead of illustrating the absolute
rows. In our evaluation, this data volume of match instances time for query execution. Each entry in Table 3 is based on
ranged from 10k up to 100k rows. It is dependent on the data the average of 10 measurements for each architecture and
volume of process and operational data, i.e., it is unlikely to Query.
have a high number of operational and process tuples and at As one can see in the row for Query I, the query based on
the same time a low number of rows in the bridge table. the match table always takes longer than the solution using
In our experiments, we evaluate two classes of analy- bridge tables. Especially when the data volume in the pro-
ses: analysis queries formulated as arbitrary SQL statements cess tables is much higher than in the operational and bridge
(Query type I) and analysis queries using BIA operators tables (results in columns 4, 7, 8, 9), the difference in ex-
(Query type II). Table 3 shows the runtime of a typical rep- ecution time becomes very large. This result was expected,
resentative query for either query class on each BIA WH ar- because Query I needs three additional joins between the
chitecture. These queries join operational data of a common huge process data tables (businessobject, activityexecution,
business warehouse with process data of a business process. workflow) for the BIA WH with match table. It would take
The representative query is based on a thorough analysis of even longer if cleansing steps become necessary, because
common business processes, e.g., those defined in [30]. As in the architecture with match table cleansing is part of the
we focus on the analysis of one match per process these analysis phase. However, for the data in our measurements
queries are adequate and it is not relevant which and how cleansing is not needed. Overall, arbitrary queries are rated
many activities as well as which other operational matches lower for the match table solution than the bridge tables in
the analyzed processes have. Table 2.
84 S. Radeschütz et al.

The architecture with bridge tables can be used for analy- tables for the joins. Otherwise, it would be a lot more time
sis in a straightforward way without any restrictions. We are consuming to find the right bridge table.
able to perform all standard OLAP and data mining opera-
tions combining data from the process and the operational 7.3 Discussion of usability
side. On the BIA WH with a match table this integration
has to be managed during analysis. The BIA operators are Another aspect in the comparison of the two architectures
able to perform this integration as part of the query execu- is their usability for the warehouse analyst. The bridge ta-
tion. Considering the three main performance indicators [8], ble architecture is better usable for formulating ad-hoc SQL
they reflect the most common queries needed for the inte- queries on the mapping between process and operational
grated analysis of process data and operational data. A typ- data, as they are based on bridge tables with a common
ical representative for query type II is an operator that an- structure. The queries can be formulated based on standard
alyzes the process side and delivers associated operational SQL background knowledge. In contrast, the BIA WH with
data, e.g. car class for variable CarModel. It starts on the a match table contains much less tables due to the fact, that
process side to find operational attributes for the given pro- all matches are stored in this match table instead of creating
cess variable element CarModel and it is unknown to the a new bridge table for each match. Thus, it makes it much
user that the related operational column is called class. The easier for an analyst to keep the overview of the available
following query employing the BIASUBALL operator helps tables. However, as the complexity of the queries is bigger
the user to retrieve information from this operational column on the BIA WH with match table, also more background
anyway. knowledge for formulating the queries is necessary. Alto-
gether, both architectures balance each other in respect to
SELECT * their usability (Table 2). In the end the analyst has to decide
FROM BIASUBALL ( INPUTDATA . CAR . MODEL , himself what aspects are better suitable to his needs.
CARSELECTIONPROCESS , CARHANDOVER )
7.4 Discussion of BIA operators
This query can be used in both architectures. But on a BIA
WH with a match table, such a combined analysis would not The BIA operators support analysts in various ways to run
be possible without the BIA operators. Applying BIASUB- holistic analyses that cover process data as well as opera-
ALL to variable CarModel as well as to the identifier of its tional data. First, they allow to efficiently handle data from
activity and process, operational attributes that are matched operational sub-dimensions. Consider in our example an an-
with this variable can be returned. Additionally, all other op- alyst who does not know the matching operational table and
erational attributes that are contained in the operational ta- columns for the variable element InputData.Car.Model in
ble of the matched attribute are returned as well. Hence, the the CarSelectionProcess. In this case, standard SQL queries
query results in tuples showing the ID of each activity in- like those shown in this section are not possible, because
stance of CarHandOver in the CarSelectionProcess with its in both BIA WH architectures, we need a loop over all op-
element InputData.Car.Model as well as the matching oper- erational tables of the matched operational columns to add
ational attributes plus the other attributes in the operational them dynamically to the from clause of the SQL query.
table. Secondly, the operators facilitate frequently needed analysis
The resulting tuples are equal in the match table and queries. In particular, they help to analyze the main perfor-
the bridge table architecture. In both architectures, Query II mance indicators for processes, i.e., cost, time and quality
needs to join the same process data tables and operational [8]. This is in particular reflected in specific operators to an-
tables, because the operator has to start from the process ta- alyze process duration and process exceptions. Finally, we
bles to look for the right variable instances, before it is able have shown in Sect. 6 that the BIA operators are perfectly
to find out the matchable operational tables that have to be suited to prepare data warehouse data for data mining algo-
joined. Line Query II in Table 3 shows that the execution rithms. The BIA operators allow to extend the data basis for
time for Query II is similar in both architectures, we illus- data mining tasks by considering process data as well as op-
trate this fact as 0 in Table 2. Only in the first three queries erational sub-dimensions. Hence, by using these operators
on a small data volume the difference is bigger. There is a the user is able to describe more detailed and more target-
overhead using the match table architecture here that has to oriented types of analysis.
join all elements with the operational match partner based
on their values, instead of just joining the few tuples already
extracted in the bridge table architecture. That would be dif- 8 Conclusion
ferent if we would not use our defined naming for the bridge
tables. Hence, Query II only needs to compare the bridge For performing the extended analysis techniques OLAP and
table names with the element names to find the right bridge data mining for BIA, we developed two warehouse architec-
Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 85

tures that hold both operational data and process data dimen- 9. Eder J, Olivotto GE, Gruber W (2002) A data warehouse for work-
sions. The architectures differ in their mapping between the flow logs. In: Proceedings of the first international conference on
engineering and deployment of cooperative information systems,
process data and their related operational data (match table
EDCIS’02. Springer, London, pp 1–15
vs. bridge tables). We demonstrated the usefulness of the ar- 10. Hammer M (1990) Reengineering work: don’t automate, obliter-
chitectures each depending on different aspects that the user ate. Harv Bus Rev 68(4):104–112
prefers to handle. The architecture with a match table turns 11. Han J (2005) Data mining: concepts and techniques. Morgan
Kaufmann, San Mateo
out to be better in most areas: it allows for a more efficient
12. ISO (2003) ISO/IEC 9075-2. Information technolog—Database
ETL processing and it needs less space to store the matches. languages—SQL. Part 2: Foundations
The bridge tables allow, however, for more efficient query 13. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review.
processing if the analyst already knows the matches between ACM Comput Surv 31(3):264–323
14. Jeng J-J, Schiefer J, Chang H (2003) An agent-based architec-
operational and process model. Then he can use the correct
ture for analyzing business processes of real-time enterprises. In:
bridge table with its stored instance data without having to EDOC, pp 86–97
join as well the workflow table and fact table to get the ex- 15. Johansson HJ, McHugh P, Pendlebury A, Wheeler W (1993) Busi-
ecution instances. Usability and execution time of queries ness process reengineering: breakpoint strategies for market dom-
inance. Wiley, New York
where the matches are unknown to the analyst is balanced in
16. Kamber M, Han J, Chiang J (1997) Metarule-guided mining of
both architectures. multi-dimensional association rules using data cubes. In: Proceed-
We introduced a set of operators that allow to define effi- ings of the third international conference on knowledge discovery
cient OLAP queries without knowing the details of matches and data mining (KDD)
between operational data and process data. The BIA opera- 17. Mansar SL, Reijers HA (2005) Best practices in business pro-
cess redesign: validation of a redesign framework. Comput Ind
tors put the user in the position of phrasing simple economic 56(5):457–471
queries to the BIA WH. We showed how data mining anal- 18. Mansmann S, Neumuth T, Scholl MH (2007) Multidimensional
yses benefit from these operators as well. data modeling for business process analysis. In: Proceedings of
In our future work, we will further explore how we can the 26th international conference on Conceptual modeling, ER’07.
Springer, Berlin, pp 23–38
utilize statistics from earlier analysis results in the ware- 19. McCoy D (2002) Business activity monitoring: calm before the
housing phase that are also used in the analysis phase for storm. Technical report LE15-9727, Gartner
the preselection of certain attributes and their data. Statis- 20. Michie D, Spiegelhalter DJ, Taylor CC, Campbell J (eds) (1994)
tics could also be used for the creation of the BIA WH. Machine learning, neural and statistical classification. Ellis Hor-
wood, Chichester
In this way, statistics about relevant operational attributes 21. Müller R, Greiner U, Rahm E (2004) Agentwork: a workflow sys-
could help to reduce the amount of bridge tables. Another tem supporting rule-based workflow adaptation. Data Knowl Eng
future step is to develop concrete optimization patterns that 51(2):223–256
apply the results gained by BIA for process improvement 22. Niedermann F, Radeschuetz S, Mitschang B (2010) Deep business
optimization: a platform for automated process optimization. In:
and reengineering. Proc BPSC
23. OASIS Standard (2007) Web services business process execu-
tion language, version 2.0. Available: http://docs.oasis-open.org/
References wsbpel
24. Parsons L, Haque E, Liu H (2004) Subspace clustering for high di-
1. Agrawal R, Gunopulos D, Leymann F (1998) Mining process mensional data: a review. ACM SIGKDD Explor Newsl 6(1):90–
models from workflow logs. In: Proc of extending database tech- 105
nology, London, UK 25. Poole J, Chang D, Tolbert D, Mellor D (2003) Common ware-
2. Agrawal R, Imielinski T, Swami A (1993) Mining association house metamodel developer’s guide. Wiley, New York
rules between sets of items in large databases. In: Proceedings 26. Radeschütz S, Mitschang B (2009) Extended analysis techniques
of the 1993 ACM SIGMOD international conference on manage- for a comprehensive business process optimization. In: KMIS,
ment of data, Washington, DC, pp 26–28 pp 77–82
3. Bruckner RM, List B, Schiefer J (2002) Striving towards near real- 27. Radeschütz S, Mitschang B, Leymann F (2008) Matching of pro-
time data integration for data warehouses. In: Proc of data ware- cess data and operational data for a deep business analysis. In:
housing and knowledge discovery, France Proc of I-ESA, Germany
4. Casati F, Castellanos M, Dayal U, Salazar N (2007) A generic 28. Radeschütz S et al (2010) BIAEditor—matching process and op-
solution for warehousing business process data. In: Proc very large erational data for a business impact analysis. In: Proc EDBT conf
data bases, Austria 29. Rokach L, Maimon O (2005) Top-down induction of decision
5. Castellanos M, Casati F, Dayal U, Shan M-C (2004) A compre- trees classifiers—a survey. IEEE Trans Syst Man Cybern, Part C,
hensive and automated approach to intelligent business processes Appl Rev 35(4):476–487
execution analysis. Distrib Parallel Databases 16(3):239–273 30. Rosettanet (2011) Overview: clusters, segments, and pips, version
6. Champy J (1995) Reengineering management. Harper Collins, 02.13.00. http://www.rosettanet.org
New York 31. Rubin V, Günther CW, van der Aalst WMP, Kindler E, van Don-
7. Chaudhuri S, Dayal U (1997) An overview of data warehousing gen BF, Schäfer W (2007) Process mining framework for software
and OLAP technology. SIGMOD Rec 26(1):65–74 processes. In: Proc of international conference on software pro-
8. Dumas M et al (2005) Process-aware information systems: bridg- cess, USA
ing people and software through process technology. Wiley, New 32. Sayal M, Casati F, Dayal U, Shan M-C (2002) Business process
York cockpit. In: Proc of very large data bases, China
86 S. Radeschütz et al.

33. Schiefer J, Jeng J-J, Bruckner RM (2003) Real-time workflow 40. W3C Submission: web service modeling language. Available:
audit data integration into data warehouse systems. In: ECIS, http://www.w3.org/Submission/WSML/
pp 1697–1706 41. W3C Recommendation: web ontology language (2004) Available:
34. Shahzad K, Johannesson P (2009) An evaluation of process ware- http://www.w3.org/TR/owl-ref/
housing approaches for business process analysis. In: Proceedings 42. Weerawarana S, Curbera F, Leymann F, Storey T, Ferguson DF
of the international workshop on enterprises & #38; Organiza- (2005) Web services platform architecture. Prentice Hall, New
tional modeling and simulation, EOMAS’09. ACM, New York York
35. Uysal I, Güvenir HA (1999) An overview of regression techniques 43. Wetzstein B, Leitner P, Rosenberg F, Brandic I, Dustdar S, Ley-
for knowledge discovery. Knowl Eng Rev 14(4):319–340 mann F (2009) Monitoring and analyzing influential factors of
36. van der Aalst WMP (2001) Re-engineering knock-out processes. business process performance. In: EDOC, pp 141–150
Decis Support Syst 30(4):451–468 44. zur Muehlen M (2004) Workflow-based process controlling.
37. van der Aalst WMP (2011) Process mining: discovery, confor- Foundation, design, and application of workflow-driven process
mance and enhancement of business processes. Springer, Berlin information systems. Logos, Berlin
38. van der Aalst WMP, van Dongen BF, Herbst J, Maruster L, 45. zur Muehlen M, Shapiro R (2009) Business process analytics.
Schimm G, Weijters AJMM (2003) Workflow mining: a survey In: Handbook on business process management, vol 2. Springer,
of issues and approaches. Data Knowl Eng 47(2):237–267 Berlin
39. W3C. Semantic annotations for WSDL and XML schema. Avail-
able: http://www.w3.org/TR/sawsdl/

You might also like