Professional Documents
Culture Documents
DOI 10.1007/s00450-013-0247-3
R E G U L A R PA P E R
Received: 27 September 2010 / Accepted: 7 August 2013 / Published online: 14 September 2013
© Springer-Verlag Berlin Heidelberg 2013
Abstract The ability to continuously adapt its business pro- Keywords Business process optimization · Data
cesses is a crucial ability for any company in order to sur- warehousing · Information integration · OLAP · Data
vive in today’s dynamic world. In order to accomplish this mining
task, a company needs to profoundly analyze all its business
data. This generates the need for data integration and analy-
sis techniques that allow for a comprehensive analysis. 1 Introduction
A particular challenge when conducting this analysis is
the integration of process data generated by workflow en-
Increasing competition and significantly shortened product
gines and operational data that is produced by business ap-
lifecycles led to a situation where fast adaption and con-
plications and stored in data warehouses. Typically, these
two types of data are not matched as their acquisition and tinuous optimization of business processes are critical fac-
analysis follows different principles, i.e., a process-oriented tors in determining the success of a company [42]. Business
view versus a view focusing on business objects. process optimization aims to improve processes of an or-
To address this challenge, we introduce a framework that ganization, e.g., by discovering and removing unnecessary
allows to improve business processes considering an inte- activities and by replacing activities by more efficient ones
grated view on process data and operational data. We present [17, 36]. For these optimizations, companies need to analyze
and evaluate various architectural options for the data ware- process data using analysis techniques such as process mon-
house that provides this integrated view based on a special- itoring and process mining [37]. Operational data of other
ized federation layer. This integrated view is also reflected in business applications, e.g., data from business transactions
a set of operators that we introduce. We show how these op- or from master data management, is stored in a data ware-
erators ease the definition of analysis queries and how they house and analyzed separately via OLAP (Online Analyti-
allow to extract hidden optimization patterns by using data cal Processing) and data mining. All these methods usually
mining techniques. fall short when it comes to questions requiring an integrated
view on both process data and operational data. As an ex-
ample, consider a car rental company that tries to optimize
its rental process. A highly relevant question to a business
analyst would be how trainings and work experience affect
the execution time as well as the success of the process. An-
S. Radeschütz · H. Schwarz (B) · F. Niedermann swering this question requires both process data (process ex-
IPVS, Universität Stuttgart, Universitätsstr. 38, 70569 Stuttgart,
Germany ecution data, paths taken) as well as operational data related
e-mail: holger.schwarz@ipvs.uni-stuttgart.de to the employees involved in process execution (work ex-
S. Radeschütz perience, trainings, demographics). In such a situation, an
e-mail: sylvia.radeschuetz@ipvs.uni-stuttgart.de integrated analysis would make a valuable contribution by
F. Niedermann ensuring that all relevant data is taken into account. We call
e-mail: florian.niedermann@ipvs.uni-stuttgart.de this approach Business Impact Analysis (BIA).
70 S. Radeschütz et al.
level of importance. In order to be able to perform such services. One of the most prominent examples is Semantic
an extensive optimization, we propose a lifecycle for BIA Annotations for WSDL (SAWSDL) [39]. SAWSDL extends
that comprises four phases (see Fig. 1): During the match- WSDL components with an URL to a given ontology con-
ing phase, process model elements are combined with el- cept to enable their semantic annotation. Thus, we selected
ements in the operational data models. These matches are SAWSDL for the annotation of model elements for semi-
used in the ETL (Extraction-Transformation-Load) process- automatic matching. For BIA, we are primarily interested in
ing of the warehousing phase to cleanse and integrate pro- SAWSDL annotations for process variable definitions. Fur-
cess data and operational data into one BIA warehouse (BIA thermore, we extended the annotation capabilities to cover
WH). The analysis phase discovers additional optimization operational data as well, which leads to a set of annotated
knowledge by considering both kinds of data. This know- schema elements.
ledge provides the basis for the optimization phase, where
the process is restructured to achieve better business results. Matching In this step, the BIA-Matcher applies matching
We give an overview of the phases in the following subsec- rules to find matches between variable elements and ele-
tions. In this paper, we focus especially on the phases for ments of operational data models. For the semi-automatic
warehousing and analysis that are discussed in Sects. 4 to 6. matching, the BIA-Matcher uses a suitable reasoning tool to
infer logical consequences from semantically annotated data
2.2.1 Matching phase using the loaded ontologies. For this purpose, the BIA-Mat-
cher holds a selection of appropriate inference rules. We are
The goal for this phase is to find for all variable elements of especially interested in rules that discover synonyms, sub-
the given process the matching elements in operational data classes, equality or union relationships between the concepts
models. The matching phase traverses process and schema of two annotated match partners as well as other concept re-
models to determine matching schema elements for each lations defined by user-defined ontology rules.
process variable element and assigns similarity values bet- For non-annotated or partially annotated models, the
ween 0 and 1. There are different ways how our framework BIA-Matcher employs rules for automatic matching that
supports the matching: consider process features in addition to common schema
matching algorithms. In order to match process variables
– manual matching: Matches are found and stored manually with elements in operational data models the rules exploit,
by a business analyst. e.g., names of the elements or data dependencies via the data
– semi-automatic matching: Model elements are semanti- flow.
cally annotated and matched automatically. In both matching approaches, the Matcher considers the
– automatic matching: The matches are found purely auto- context of match partners, e.g., the names of process com-
matic. ponents working with the matched variable or the names of
The matching phase starts by loading all process mod- parents of operational elements. The context is used to refine
els and operational data models into the BIA-Matcher. It found matches and to improve the precision rate.
also loads process variables from their process source file In our sample scenario, the Matcher combines the fol-
or from audit trails of proprietary workflow engines where lowing elements: (1) We assume that the element Cust.ID of
process data is stored after process execution. Furthermore, variable inputData is combined with attribute CID of the op-
the BIA-Matcher loads elements of operational data models erational table Customer, (2) the executing roles (assignees)
from relational databases, XML databases or from CWM of ContractNegotiation with table Employee, (3) element
(Common Warehouse Metamodel) [25], which is a standard Car.Model of variable ServiceInfo with attribute Class of ta-
format to unify the interchange of proprietary formats in the ble Automobile.
warehousing process. Ontologies (Web Ontology Language
(OWL) [41] and Web Service Modeling Language (WSML) 2.2.2 Warehousing phase
[40]), that are used for semantic semi-automatic matching,
are loaded as well. An integrated data warehouse has to be established in order
An overview of the BIA-Matcher can be found in [27], to enable a Business Impact Analysis. The BIA-Loader as
and [28] demonstrates its functionalities. The main compo- shown in Fig. 1 supports the building of this warehouse and
nents for annotating model elements and the steps for semi- its ETL flow. Its components are only briefly described here,
automatic and automatic matchings are explained in the fol- we explain further details in Sect. 4.
lowing:
WH creation As we have to handle huge amounts of pro-
Annotation Within Semantic Web research a number of cess execution data, operational transaction data and master
standards have been established for the annotation of web data, we do not build one consolidated warehouse, but leave
72 S. Radeschütz et al.
results and to rewrite the business processes improving their [4] of these warehouses have been studied in previous work.
performance. To successfully rewrite and improve the pro- But all the mentioned approaches and techniques refer to
cess model, the BIA-Optimizer needs to detect the appro- the actual flow logic. Operational data sources with further
priate rewrite rules depending on the given business goals. information are typically neglected.
Hence, the BIA-Optimizer gets the optimization goal as well Operational data comprises all data processed within the
as any constraint that should be considered during the opti- business that is not stored in an audit trail, but by other sys-
mization in terms of cost, quality, utilized resources or busi- tems, e.g., ERP systems and data warehouses. OLAP and
ness rules. The engine identifies applicable optimization pat- data mining techniques are typically employed to this data
terns (see also [22]) to fulfill these business goals and de- in order to analyze it and reveal hidden pattern. OLAP sup-
cides which rewrite rule fits best and should be executed. ports users in interactively analyzing multidimensional data
The rewrite rules are based on “best practice” techniques from multiple perspectives [7]. Data mining is the process of
for the optimization of processes [8]. The practices are for- searching large volumes of data for patterns using methods
malized in these rules to achieve a high extent of automated such as classification, clustering and association rule dis-
optimization. In the last step, the BIA-Optimizer automati- covery [2, 11, 29]. However, they do not consider the chal-
cally rewrites the business process according to the chosen lenges of integrating process data and operational data and
rule and allows the user to add process modifications manu- performing analyses on such a huge amount of dimensions
ally. resulting from this integration. The BIA operators that we
Using the analysis results of the correlations c1, c2 and introduce in this paper, offer an efficient way to deal with
c3 in our sample scenario, the car rental process is restruc- this integration and high dimensionality by preselecting rel-
tured in the optimization phase. In order to win wealthy cus- evant dimensions.
tomers and avoid processes with long duration in c1, they Some work has been done that tries to provide a more
are routed to special services. Analysis results for c2 let us global view on process data and operational data. In [34], the
try to raise the performance of ContractNegotiation and in- authors introduce an evaluation framework for process ware-
crease the number of accepted tasks. Therefore, a reorgani- houses. They define various perspectives a process ware-
zation of the execution roles is done. In c3, the longer intro- house should cover. Considering operational data in addi-
duction time in CarHandOver is managed. In our scenario tion to process data can be seen as part of the informa-
this activity is efficiently executed by outsourcing it during tional perspective. But neither options for the structure of an
rush hours for certain car classes. integrated warehouse nor operators to support the analysis
based on such a warehouse are discussed in [34]. The Pro-
cess Data Warehouse in [4] provides a warehouse model for
3 Related work a global analysis. However, in contrast to our BIA WH it fo-
cuses on process dimensions and the operational dimension
Only little work has been done in the area of a global anal- is not well-defined, but mixed with the process dimensions.
ysis of both workflow data and operational data. Hence, The PISA tool [44] considers process variables and only
related work also covers various approaches to derive operational data that is directly stored in these variables.
knowledge from process data or operational data in isola- No further attributes in operational dimensions are consid-
tion. ered. Furthermore, it offers only relatively simple analyses.
Pure process analysis is based on audit trails that store the None of the mentioned approaches support global data mi-
execution data of processes. Audit trails can be exploited ning techniques or OLAP operators as considered in BIA. In
in various ways. First, they are needed for business activ- those systems the business analysts have to guess about op-
ity monitoring (BAM) [14, 19, 43] to react to problems that erational relationships to certain process data to be analyzed.
arise during process enactment. Secondly, they are used as Our operators provide support for a combined analysis that
one basis for business process management systems that considers links between operational data and process data.
support the definition, execution, and tracking of business This enables analysts to make in-depth analyses more effec-
processes [3, 5, 32, 33]. A third example are process min- tive and more efficient.
ing techniques [1, 31, 36, 37, 45] that try to identify pro- Furthermore, our framework enables an overall business
cess models, check the conformance of process execution process optimization, that considers the main performance
with existing process models or aim at process optimiza- indicators time, cost and quality described in [6, 8]. In the
tion. They are based on audit trails as well. Data from au- analysis and optimization phases we aim to find options for
dit trails is often integrated in a data warehouse to be better improving the given process considering these indicators.
suited for analysis purposes. Such a data warehouse is of- However, our approach even goes deeper and aims to find
ten called a process warehouse or an audit warehouse. The hints for optimization not only in workflow data, but consid-
appropriate structure [9, 18] as well as the ETL processes ers also related operational data.
74 S. Radeschütz et al.
4.2.1 Integration by a match table Fig. 7 Integrated warehouse with bridge tables
same name (VarElemName). Then we take only the tuples 5 Operators for a business impact analysis
that refer to the same activityID and processID as described
in the match table. This is done by joining the fact table and In order to improve business processes on basis of our BIA
the workflow table with the business object and business ob- WH, it is helpful to have OLAP support that goes beyond
ject element table. From the received tuples only those are the usual OLAP SQL features such as ROLLUP or WIN-
taken that have the same value in the business object ele- DOW [12]. As introduced in the section before, the oper-
ment table and the matched operational table. The same is ational data dimension is divided into sub-dimensions. In
done for the resource matches. The names of the bridge ta- order to handle these sub-dimensions efficiently in OLAP
bles are composed with the related element name so that the and data mining analyses, we propose new operators. An-
BIA Operators will find the appropriate bridge tables for a other goal of our operators is to facilitate frequently needed
certain element in the later OLAP analysis (see Sect. 5.2.1). queries on processes as a kind of macro, e.g., an operator
According to this architecture, a new table BridgeTable_ that looks for the activities with the longest duration or for
CustID is shown in Fig. 7 for the match between the element activities with errors. The operators help to analyze the three
CustID of variable inputData and column CID of table Cus- main performance indicators: cost, time and quality of a pro-
tomer (see Fig. 2). Another bridge table BridgeTable_Car- cess (see [8]). Another purpose of the operators is to prepare
Model addresses the match between element CarModel of the warehouse dimensions for data mining algorithms, e.g.
variable ServiceInfo and column class of table Automo- returning only numeric attributes for regression mining. Be-
bile. As class is no primary key in table Automobile, Brid- fore we describe the BIA operators in detail, we introduce a
geTable_CarModel needs the operational attribute AID as sample cube in the next section. The operators use this cube
foreign key to clearly reference all tuples with this class in to exemplify their analysis results.
table Automobile. For BridgeTable_AssigneeID the same is 5.1 Sample BIA cube
done to bridge resourceID of the resource table and EID of
the employee table. The attributes in all bridge tables are The sample cube that we use to illustrate the BIA opera-
nicknames to identify the source attributes. Finally, standard tors is based on the sample scenario and the process model
cleansing steps are required for the instance data to receive shown in Fig. 2. It combines process data of this car rental
correct matches, if the matched attributes do not have the process with associated operational data and is stored in a
same values. Examples with instance data for this BIA WH data warehouse with bridge tables.
architecture can be seen in Sect. 5.1 where the analysis op- Figure 8 shows a fragment of this cube filled with sam-
erators are explained. ple data. The fact table includes all activity executions,
e.g., of ContractNegotiation. The variables and their de- restrict the output table according to different format condi-
tails are contained in table businessobject and table busi- tions. They are especially interesting to provide the appro-
nessobjectelement. The resource table with the assignees priate input for certain data mining tasks.
is not shown here. BridgeTable_CarModel bridges the el-
ement Car.Model of the activity ContractNegotiation in the BIASUBALL The BIASUBALL operator has three input
CarSelectionProcess with column class of the operational parameters: var, processid and activityid (Fig. 9(e)).
table Automobile, whereas BridgeTable_AssigneeID bridges BIASUBALL selects all columns and their values of the
the resource table with the operational employee table. The operational tables that contain matched sub-dimensional at-
operational tables Training and CarFeatures are related to tributes for the specified variable elements in the specified
Employee and Automobile via foreign keys. activity and process. The variable elements var are defined
by distinct element names, processid contains the identi-
5.2 BIA operators fier of the requested process and activityid the identifier of
the requested activity which processes the given variables.
All BIA operators are defined by the syntax snippets in When no element name of a variable is specified, correla-
Fig. 9 in Backus-Naur form. As defined by the syntax snip- tions to all elements of the given activity are emitted together
pets, all operators are a component of the table reference with the attributes in the tables where they appear.
used in the from clause of a table expression (see Fig. 9(a) The operator proceeds as follows: It selects those tuples
and (b)). The bia clause in is an alternative table reference from the process dimensions that contain the given process
that again consists of six alternative clauses representing the ID, activity ID and variables and all attributes from related
BIA operators. The syntax is the same in both architectures, operational tables as well as attributes from operational ta-
but the operators differ in their execution. The main differ- bles referencing the matched operational attribute by foreign
ence is that the operators executed on a BIA WH with match keys. The related operational attributes are gained via ex-
table have to perform their cleansing steps during the execu- ploiting either the match table or the meta data of the ware-
tion. Hence, these cleansing steps have to be executable au- house to look for bridge tables named after the correspond-
tomatically. On a BIA WH with bridge tables, this cleansing ing variable element. The output table contains the activity
is part of the ETL processing. instance and all related operational attributes. Some of them
The usage of the BIA operators requires an allocation of may be used by standard data mining techniques for deriving
several attributes and tables of the process dimensions with new optimization hints.
strictly defined naming and foreign key relations. The op-
erators work on a schema as shown in Fig. 5 with nick- Example In the following, the operator BIASUBALL is il-
lustrated using the example tables from Fig. 8. The query al-
names to identify the tables named workflow, time, activ-
lows to figure out which attributes of the resource assignee
ityexecution, businessobject, businessobjectelement and re-
determine a successful execution of the activity ContractNe-
source. At least the attributes processid, activityid and activ-
gotiation in the CarSelectionProcess:
itytype amongst others in the workflow table must have nick-
names. Furthermore, there should be nicknames for the at- SELECT BIA.*, BOE.VAR E LEM NAME , BOE.VARVALUE
tribute Name in the businessobject table, for VarElemName FROM biasuball( TASKVAR . ASSIGNEEID , C AR S E -
and VarValue in businessobjectelement, for starttime in the LECTION P ROCESS , C ONTRACT N EGOTIATION ) AS BIA ,
time table and for all attributes in the fact table. BUSINESSOBJECT BO , BUSINESSOBJECT-
ELEMENT BOE , ACTIVITYEXECUTION A
5.2.1 Operators for sub-dimension analysis WHERE BIA . ACTIVITY I NST = A . ACTIVITY I NST
AND BO . BOID = A . BOID
A first group of operators can be used to get a first overview AND BO . BOID = BOE . BOID
of the related operational attributes for a given activity. Ac- AND BO . NAME = ‘ TASKVAR ’
cording to the used BIA WH they evaluate the match ta- AND BOE . VARELEMNAME = ‘ OUTCOME ’
ble (see Fig. 6) or the bridge tables (see Fig. 7) to find all
joins between the operational dimensions and the business We select all related operational attributes for this as-
object or the resource dimension. The operators are shown signee by BIASUBALL and join them with the tables activi-
as alternative clauses for the biasub clause in Fig. 9(d). We tyexecution and businessobject(element) to get the outcomes
define the operator BIASUBALL that simply returns all re- of the activity as well. When this query is executed on our
lated sub-dimensional attributes. Hence, the output columns sample BIA-Cube we receive a result table of which an ex-
of the result table are dependent on which process and ac- tract is shown in Fig. 10. This result shows that the majority
tivity models the operator is applied. Furthermore, we de- of cases with Outcome = ‘Reject’ is related to the agent’s
fine two operators BIASUBLABEL and BIASUBNUM to trainings and is done by employees that are trained for sales.
78 S. Radeschütz et al.
Here we can draw this conclusion simply by looking at the a list where all possible nominal attributes are listed (string,
result table, but usually further analyses are necessary, e.g. character, . . . ).
by means of data mining techniques. For an optimization, The second operator BIASUBNUM (Fig. 9(g)) returns all
these findings need further investigation. numeric attributes by checking a similar list (integer, float,
decimal, numeric, . . . ). This is necessary for data mining ap-
BIASUBNUM and BIASUBLABEL As some data mining proaches such as clustering or regression where continuous
approaches need restricted categories of input data, we deve- data values are needed. The syntax of the operator also re-
loped two additional operators. The BIASUBLABEL opera- sembles the BIASUBALL operator.
tor (Fig. 9(f)) filters all attributes that have nominal val-
ues. This operator is especially necessary for data mining Example If we use BIASUBNUM instead of BIASUBALL
approaches such as classification where all attributes are as in the previous example, the result table looks similar
grouped by their nominal data categories. If a transformation to Fig. 10. However, this time we receive only the numeric
of numeric values to such categories is not possible or not columns. The categorical attributes are eliminated from the
wanted, this operator is very important. Its syntax is equal operational output table. So the operational columns of em-
and its output table is similar to the BIASUBALL output, ployee (Name, Position, Tname and Tdate) are excluded, as
but only nominal attributes are emitted. The operator checks they have non-numeric data types. All other columns are
shown, as they are numeric (Salary) or belong to joined pro- BIAFLUCT High duration variances for the same activ-
cess or bridge tables. BIASUBLABEL would operate anal- ity in different process executions give also hints for pro-
ogously returning the nominal attributes instead. cess optimizations. The BIAFLUCT operator aims to find
these activities. The operator searches for all activities for
5.2.2 Operators for duration analysis a given process similar to the previous two operators. How-
ever, it searches for tuples with activities whose execution
As the duration of business processes plays a major role for length fluctuates in the given duration value. That means
process optimization, our framework offers three operators it searches if the amount of percent executions fluctuates
to support the duration analysis: BIAHT, BIAPL and BIA- (i.e. is slower or faster) in the indicated duration than
FLUCT. Their syntax is depicted in Figs. 9(h), (i) and (j). the average time of the other instances of the same activ-
ity model. The duration value is expected in the following
BIAHT and BIAPL The human task activity and, more
format 000:00:00:000 (days:hours:minutes:seconds).
general, the invoke activity waiting for response of partner
services are a source for optimization, because they often
Example In the following, we further investigate all activi-
last very long. Our operators find these activity types in the
ties in the CarSelectionProcess. We create an example query
given process executions and return the according tuples to-
using BIAFLUCT to search all activities of which 20 % last
gether with their variables and their duration length. The BI-
more than 30 minutes longer or are more than 30 minutes
AHT operator searches for all human task activities in the
fact table for the given process identified by processid. It faster than the average time of the according activity exe-
joins the fact table with the workflow table (to select all ap- cution. We select the activity executions together with the
propriate activities) and with the time table (to get the dura- information on used variables:
tion of the activity executions). The workflow and the time SELECT *
table are not shown in Fig. 8, but only indicated in Fig. 5. FROM BIAFLUCT(C AR S ELECTION P ROCESS,
The approach of the BIAPL operator is just the same, but it 000:00:30:000, 20)
searches for general invoke activities to partner links instead
of only human tasks. The output table is similar to Fig. 11. It shows the same
attributes, but only for those activity instances where at least
Example In the following example query we use BIAHT to 20 % of the instances deviate more than 30 minutes from the
investigate the processing time of human task activities used average execution time for this activity. Furthermore it may
in the CarSelectionProcess: contain also other activity instances, e.g., activity GetCar-
SELECT * FROM BIAHT (C AR S ELECTION P ROCESS ) Availability with different variable elements, as it shows all
activity types. We also get ContractNegotiation instances as
We receive the executed activities and their durations to- result. If we use BIASUBALL operator, the reason for the
gether with the business object elements and the ResourceID negotiation delay for certain rental car models, e.g. AudiTT,
of responsible employees. Using our sample BIA-Cube this may be discovered: A high choice of extras in some automo-
query results in an output table as shown in Fig. 11 with the biles needs a lot of callbacks to the customer. For a process
duration calculated by using the start time of the activities optimization, we could change the process model and add
in the time table, with the ResourceID of the resource ta- an extra activity at the beginning that requests for additional
ble and the CarModel, Location and CustID attributes of the information from the customers that are interested in certain
businessobjectelement table. We discover that process in- car models.
stances in ContractNegotiation for InputData.Car.Model =
‘AudiTT’, are slower in their processing time than other car 5.2.3 Operators for exception analysis
models and need often more than 30 minutes. Using the next
BIA operator, we may take a closer look at these duration One important optimization goal is the prevention of errors
fluctuations. during process execution. Our framework offers two oper-
80 S. Radeschütz et al.
Example In the following, we further investigate the activ- Fig. 13 BIA-clustering example
ities in the CarSelectionProcess. Our example query uses
BIAERROR to search for all activities which end in a faulty
state for more than 30 percent of the executions. We select terns from the audit trails and operational data. We showed
these activity executions together with their used variable in [26] how common data mining techniques such as cluster-
data: ing, classification, association rule discovery and regression
(all specified in detail e.g. in [11]) are applied to a combi-
SELECT * nation of process dimensions and operational dimensions in
FROM BIAERROR(C AR S ELECTION P ROCESS , 30) the BIA approach. In this section, we show how to use the
BIA-Operators to preprocess data in the warehouse in order
We discover in the output table shown in Fig. 12, that
to simplify the application of these data mining techniques.
process instances of activity CarHandOver are faulty very
The examples are based on the car rental scenario again.
often. For a process optimization we need further analyses
and have to check the related operational attributes of the
6.1 Clustering
found activity variables and of their preceding activity vari-
ables.
Grouping a set of related objects into certain classes of ob-
BIAEXCEPT The second operator BIAEXCEPT (l) takes jects that are similar to each other and dissimilar to other
a closer look at the exception handlers that are used to catch classes is called clustering [13]. Especially high-dimension
errors that might have come up during process execution. clustering methods [24] are interesting for BIA because of
So this operator evaluates the execution data not for faulty the high number of sub-dimensions that result from the high
results, but for errors that are not so evident, because the number of process variables in different process activities.
process state is completed. Although the errors have been In order to prepare the data set for clustering, we may
caught, they still signal some kind of problem that might use the BIAFLUCT and BIASUBNUM operator. In a first
be worth a closer look. The operator compares regular and step, BIAFLUCT is used to receive all services that have
caught process executions, i.e., if a exception handler is exe- big fluctuations in the CarSelectionProcess. We may dis-
cuted (this is stored in the audit log), and returns the activity cover that activity GetCarAvailability appears often. For the
instances together with related variable data and resources. clustering, we only can use numeric attributes. Therefore,
BIASUBNUM is applied for GetCarAvailability and all of
its variables. We are interested especially in applying BI-
6 Data mining techniques for business impact analysis ASUBNUM to the CustID variable in order to be able to
cluster the activities in relation to their sub-dimensional cus-
As an additional option in the analysis phase, we use data tomer attributes. Figure 13 illustrates the clustering along
mining techniques for extracting hidden optimization pat- two dimensions, i.e., the execution time of activity instances
Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 81
Complexity of ETL + −
Data volume + −
Runtime of arbitrary SQL query − +
(Query type I)
Runtime of query with operators 0 0
(Query type II)
Usability 0 0
Table 3 Evaluation of BIA WH architectures For query type I, the schema of all warehouse tables has
to be well-known. In the warehouse with bridge tables, we
only need two joins between table businessobjectelement,
a bridge table and an operational table to receive all tuples
for one variable element and all attributes in the operational
table of its related operational element. For the example il-
lustrated in Fig. 7, the following query links car class in-
formation to process variables using tables Automobile and
BridgeTable_CarModel. For simplicity, all nicknames to the
source attributes are named just as the original attributes:
The architecture with bridge tables can be used for analy- tables for the joins. Otherwise, it would be a lot more time
sis in a straightforward way without any restrictions. We are consuming to find the right bridge table.
able to perform all standard OLAP and data mining opera-
tions combining data from the process and the operational 7.3 Discussion of usability
side. On the BIA WH with a match table this integration
has to be managed during analysis. The BIA operators are Another aspect in the comparison of the two architectures
able to perform this integration as part of the query execu- is their usability for the warehouse analyst. The bridge ta-
tion. Considering the three main performance indicators [8], ble architecture is better usable for formulating ad-hoc SQL
they reflect the most common queries needed for the inte- queries on the mapping between process and operational
grated analysis of process data and operational data. A typ- data, as they are based on bridge tables with a common
ical representative for query type II is an operator that an- structure. The queries can be formulated based on standard
alyzes the process side and delivers associated operational SQL background knowledge. In contrast, the BIA WH with
data, e.g. car class for variable CarModel. It starts on the a match table contains much less tables due to the fact, that
process side to find operational attributes for the given pro- all matches are stored in this match table instead of creating
cess variable element CarModel and it is unknown to the a new bridge table for each match. Thus, it makes it much
user that the related operational column is called class. The easier for an analyst to keep the overview of the available
following query employing the BIASUBALL operator helps tables. However, as the complexity of the queries is bigger
the user to retrieve information from this operational column on the BIA WH with match table, also more background
anyway. knowledge for formulating the queries is necessary. Alto-
gether, both architectures balance each other in respect to
SELECT * their usability (Table 2). In the end the analyst has to decide
FROM BIASUBALL ( INPUTDATA . CAR . MODEL , himself what aspects are better suitable to his needs.
CARSELECTIONPROCESS , CARHANDOVER )
7.4 Discussion of BIA operators
This query can be used in both architectures. But on a BIA
WH with a match table, such a combined analysis would not The BIA operators support analysts in various ways to run
be possible without the BIA operators. Applying BIASUB- holistic analyses that cover process data as well as opera-
ALL to variable CarModel as well as to the identifier of its tional data. First, they allow to efficiently handle data from
activity and process, operational attributes that are matched operational sub-dimensions. Consider in our example an an-
with this variable can be returned. Additionally, all other op- alyst who does not know the matching operational table and
erational attributes that are contained in the operational ta- columns for the variable element InputData.Car.Model in
ble of the matched attribute are returned as well. Hence, the the CarSelectionProcess. In this case, standard SQL queries
query results in tuples showing the ID of each activity in- like those shown in this section are not possible, because
stance of CarHandOver in the CarSelectionProcess with its in both BIA WH architectures, we need a loop over all op-
element InputData.Car.Model as well as the matching oper- erational tables of the matched operational columns to add
ational attributes plus the other attributes in the operational them dynamically to the from clause of the SQL query.
table. Secondly, the operators facilitate frequently needed analysis
The resulting tuples are equal in the match table and queries. In particular, they help to analyze the main perfor-
the bridge table architecture. In both architectures, Query II mance indicators for processes, i.e., cost, time and quality
needs to join the same process data tables and operational [8]. This is in particular reflected in specific operators to an-
tables, because the operator has to start from the process ta- alyze process duration and process exceptions. Finally, we
bles to look for the right variable instances, before it is able have shown in Sect. 6 that the BIA operators are perfectly
to find out the matchable operational tables that have to be suited to prepare data warehouse data for data mining algo-
joined. Line Query II in Table 3 shows that the execution rithms. The BIA operators allow to extend the data basis for
time for Query II is similar in both architectures, we illus- data mining tasks by considering process data as well as op-
trate this fact as 0 in Table 2. Only in the first three queries erational sub-dimensions. Hence, by using these operators
on a small data volume the difference is bigger. There is a the user is able to describe more detailed and more target-
overhead using the match table architecture here that has to oriented types of analysis.
join all elements with the operational match partner based
on their values, instead of just joining the few tuples already
extracted in the bridge table architecture. That would be dif- 8 Conclusion
ferent if we would not use our defined naming for the bridge
tables. Hence, Query II only needs to compare the bridge For performing the extended analysis techniques OLAP and
table names with the element names to find the right bridge data mining for BIA, we developed two warehouse architec-
Business impact analysis—a framework for a comprehensive analysis and optimization of business processes 85
tures that hold both operational data and process data dimen- 9. Eder J, Olivotto GE, Gruber W (2002) A data warehouse for work-
sions. The architectures differ in their mapping between the flow logs. In: Proceedings of the first international conference on
engineering and deployment of cooperative information systems,
process data and their related operational data (match table
EDCIS’02. Springer, London, pp 1–15
vs. bridge tables). We demonstrated the usefulness of the ar- 10. Hammer M (1990) Reengineering work: don’t automate, obliter-
chitectures each depending on different aspects that the user ate. Harv Bus Rev 68(4):104–112
prefers to handle. The architecture with a match table turns 11. Han J (2005) Data mining: concepts and techniques. Morgan
Kaufmann, San Mateo
out to be better in most areas: it allows for a more efficient
12. ISO (2003) ISO/IEC 9075-2. Information technolog—Database
ETL processing and it needs less space to store the matches. languages—SQL. Part 2: Foundations
The bridge tables allow, however, for more efficient query 13. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review.
processing if the analyst already knows the matches between ACM Comput Surv 31(3):264–323
14. Jeng J-J, Schiefer J, Chang H (2003) An agent-based architec-
operational and process model. Then he can use the correct
ture for analyzing business processes of real-time enterprises. In:
bridge table with its stored instance data without having to EDOC, pp 86–97
join as well the workflow table and fact table to get the ex- 15. Johansson HJ, McHugh P, Pendlebury A, Wheeler W (1993) Busi-
ecution instances. Usability and execution time of queries ness process reengineering: breakpoint strategies for market dom-
inance. Wiley, New York
where the matches are unknown to the analyst is balanced in
16. Kamber M, Han J, Chiang J (1997) Metarule-guided mining of
both architectures. multi-dimensional association rules using data cubes. In: Proceed-
We introduced a set of operators that allow to define effi- ings of the third international conference on knowledge discovery
cient OLAP queries without knowing the details of matches and data mining (KDD)
between operational data and process data. The BIA opera- 17. Mansar SL, Reijers HA (2005) Best practices in business pro-
cess redesign: validation of a redesign framework. Comput Ind
tors put the user in the position of phrasing simple economic 56(5):457–471
queries to the BIA WH. We showed how data mining anal- 18. Mansmann S, Neumuth T, Scholl MH (2007) Multidimensional
yses benefit from these operators as well. data modeling for business process analysis. In: Proceedings of
In our future work, we will further explore how we can the 26th international conference on Conceptual modeling, ER’07.
Springer, Berlin, pp 23–38
utilize statistics from earlier analysis results in the ware- 19. McCoy D (2002) Business activity monitoring: calm before the
housing phase that are also used in the analysis phase for storm. Technical report LE15-9727, Gartner
the preselection of certain attributes and their data. Statis- 20. Michie D, Spiegelhalter DJ, Taylor CC, Campbell J (eds) (1994)
tics could also be used for the creation of the BIA WH. Machine learning, neural and statistical classification. Ellis Hor-
wood, Chichester
In this way, statistics about relevant operational attributes 21. Müller R, Greiner U, Rahm E (2004) Agentwork: a workflow sys-
could help to reduce the amount of bridge tables. Another tem supporting rule-based workflow adaptation. Data Knowl Eng
future step is to develop concrete optimization patterns that 51(2):223–256
apply the results gained by BIA for process improvement 22. Niedermann F, Radeschuetz S, Mitschang B (2010) Deep business
optimization: a platform for automated process optimization. In:
and reengineering. Proc BPSC
23. OASIS Standard (2007) Web services business process execu-
tion language, version 2.0. Available: http://docs.oasis-open.org/
References wsbpel
24. Parsons L, Haque E, Liu H (2004) Subspace clustering for high di-
1. Agrawal R, Gunopulos D, Leymann F (1998) Mining process mensional data: a review. ACM SIGKDD Explor Newsl 6(1):90–
models from workflow logs. In: Proc of extending database tech- 105
nology, London, UK 25. Poole J, Chang D, Tolbert D, Mellor D (2003) Common ware-
2. Agrawal R, Imielinski T, Swami A (1993) Mining association house metamodel developer’s guide. Wiley, New York
rules between sets of items in large databases. In: Proceedings 26. Radeschütz S, Mitschang B (2009) Extended analysis techniques
of the 1993 ACM SIGMOD international conference on manage- for a comprehensive business process optimization. In: KMIS,
ment of data, Washington, DC, pp 26–28 pp 77–82
3. Bruckner RM, List B, Schiefer J (2002) Striving towards near real- 27. Radeschütz S, Mitschang B, Leymann F (2008) Matching of pro-
time data integration for data warehouses. In: Proc of data ware- cess data and operational data for a deep business analysis. In:
housing and knowledge discovery, France Proc of I-ESA, Germany
4. Casati F, Castellanos M, Dayal U, Salazar N (2007) A generic 28. Radeschütz S et al (2010) BIAEditor—matching process and op-
solution for warehousing business process data. In: Proc very large erational data for a business impact analysis. In: Proc EDBT conf
data bases, Austria 29. Rokach L, Maimon O (2005) Top-down induction of decision
5. Castellanos M, Casati F, Dayal U, Shan M-C (2004) A compre- trees classifiers—a survey. IEEE Trans Syst Man Cybern, Part C,
hensive and automated approach to intelligent business processes Appl Rev 35(4):476–487
execution analysis. Distrib Parallel Databases 16(3):239–273 30. Rosettanet (2011) Overview: clusters, segments, and pips, version
6. Champy J (1995) Reengineering management. Harper Collins, 02.13.00. http://www.rosettanet.org
New York 31. Rubin V, Günther CW, van der Aalst WMP, Kindler E, van Don-
7. Chaudhuri S, Dayal U (1997) An overview of data warehousing gen BF, Schäfer W (2007) Process mining framework for software
and OLAP technology. SIGMOD Rec 26(1):65–74 processes. In: Proc of international conference on software pro-
8. Dumas M et al (2005) Process-aware information systems: bridg- cess, USA
ing people and software through process technology. Wiley, New 32. Sayal M, Casati F, Dayal U, Shan M-C (2002) Business process
York cockpit. In: Proc of very large data bases, China
86 S. Radeschütz et al.
33. Schiefer J, Jeng J-J, Bruckner RM (2003) Real-time workflow 40. W3C Submission: web service modeling language. Available:
audit data integration into data warehouse systems. In: ECIS, http://www.w3.org/Submission/WSML/
pp 1697–1706 41. W3C Recommendation: web ontology language (2004) Available:
34. Shahzad K, Johannesson P (2009) An evaluation of process ware- http://www.w3.org/TR/owl-ref/
housing approaches for business process analysis. In: Proceedings 42. Weerawarana S, Curbera F, Leymann F, Storey T, Ferguson DF
of the international workshop on enterprises & #38; Organiza- (2005) Web services platform architecture. Prentice Hall, New
tional modeling and simulation, EOMAS’09. ACM, New York York
35. Uysal I, Güvenir HA (1999) An overview of regression techniques 43. Wetzstein B, Leitner P, Rosenberg F, Brandic I, Dustdar S, Ley-
for knowledge discovery. Knowl Eng Rev 14(4):319–340 mann F (2009) Monitoring and analyzing influential factors of
36. van der Aalst WMP (2001) Re-engineering knock-out processes. business process performance. In: EDOC, pp 141–150
Decis Support Syst 30(4):451–468 44. zur Muehlen M (2004) Workflow-based process controlling.
37. van der Aalst WMP (2011) Process mining: discovery, confor- Foundation, design, and application of workflow-driven process
mance and enhancement of business processes. Springer, Berlin information systems. Logos, Berlin
38. van der Aalst WMP, van Dongen BF, Herbst J, Maruster L, 45. zur Muehlen M, Shapiro R (2009) Business process analytics.
Schimm G, Weijters AJMM (2003) Workflow mining: a survey In: Handbook on business process management, vol 2. Springer,
of issues and approaches. Data Knowl Eng 47(2):237–267 Berlin
39. W3C. Semantic annotations for WSDL and XML schema. Avail-
able: http://www.w3.org/TR/sawsdl/